Open Agpa88 opened 5 years ago
Can you please try running it with less workers? The amount of RAM allocated to each worker sometimes is insufficient for a given STAR alignment and can result in the outcome you experienced. For example, if you have 64 fastq files being aligned and you give 128gb ram to 40 workers, the 128/4 may only be enough ram for some alignments and not others.
@Agpa88
In the mean time, would you please attach *.fastqLog.out
for one of the failed sample, you will find this file under outdir/result/tempFolder
@demis001
@jshousephd Thanks! I will definitely try it
@demis001 yes sure, here it is: plate5_D10.fastqLog.out.txt Thanks!
@Agpa88
Here is the error:
EXITING: fatal error trying to allocate genome arrays, exception thrown: std::bad_alloc
Possible cause 1: not enough RAM. Check if you have enough RAM 2782013395 bytes
Possible cause 2: not enough virtual memory allowed with ulimit. SOLUTION: run ulimit -v 2782013395
I think the @jshousephd recommendation will resolve your issue.
You don't have enough memory to process multiple task at the same time. The best way is to reduce the number of cpu you assign.
Best, Dereje
@Agpa88
Reduce the number you pass to -c CPUNUM, --cpuNum CPUNUM
. Something like -c 3, I don''t know how much you used. The default is 4
Dear @demis001 and @jshousephd, Solution you suggested worked! I had to pass 2 (3 was still to much) as CPUNUM argument for -cpuNum function. I have virtual Linux machine on company server, so I guess I must have very little memory allocated for it. Nevertheless, you were extremely helpful and as a newbie to GitHub and Linux, I am astonished how this community works. Thanks a lot once again!! Best, Agnieszka
Dear Dereje,
Unfortunately, I encounter another instance of the “incomplete result tables are generated” problem, so I am refreshing the issue. Now I am trying to use Temposeqcount tool for Biospyder Whole Transcriptome assay. It means that I would like to process a slightly bigger FASTQ files (up to 250MB, so around 5x bigger than for S1500 assay) and feed the script with manifest file of 22.000 sequences (so around 7 times more than for S1500 assay, which usually consist up to 3.000 detection oligos sequences).
What I observe is:
Program stops prematurely not entering stage 9 without giving any error (see the screenshot)
File called “resultDATA_alignment_summary.csv” is not generated
File called “resultDATA_COUNT_countcombined.csv” contains zeros for some samples (attachment) resultDATA_COUNT_countcombined_incomplete.xlsx
Samples-specific .log.out files have incomplete logs (attachment) Example_of_unsuccessful_sample.fastqLog.txt
Since I work now on Azure Linux Virtual Machine with top parameters (128GB RAM, 64 virtual CPUs), I don’t think it’s a memory or resources allocation problem.
To get a better insight what is happening, I run several combinations of input files:
So, it seems to me that the bottle neck is increased number of sequences in manifest file that have to be processed. Do you have any ideas how to solve it? Or maybe Temposeqcount was designed for S1500 assays and cannot cope with whole transcriptome assay?
I would again appreciate your help very much. Best, Agnieszka
@Agpa88
Would you please do this and paste the result for the failed samples?
cd outdir/result/tempFolder
The outdir
is the the one you named you output directories.
Then, do:
ll
in the terminal and past the output.
@demis001
Here it is: Agnieszka
Would you please send me *.fastqLog.*
for the single sample?
Dereje
On Wed, May 8, 2019 at 1:39 PM Agpa88 notifications@github.com wrote:
Here it is: [image: image] https://user-images.githubusercontent.com/30736099/57395657-f76ca080-71c8-11e9-887d-d128d4364ad3.png Agnieszka
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/demis001/temposeqcount/issues/26#issuecomment-490581035, or mute the thread https://github.com/notifications/unsubscribe-auth/ACCPKKQT3NOGO4L3KHM36FDPUMF5JANCNFSM4GWM4GAA .
Isn't the one I attached up there in the previous message? Or you mean another one? Agnieszka
@Agpa88,
I don't see any problem in the log file you sent. If you don't mind do you able to share with me the manifest csv file and the fastq file that failed through google drive. I will run it on my system to troubleshoot it. Do you see any error in *.fastqLog.progress.out?
Best, Dereje
On Thu, May 9, 2019 at 3:29 AM Agpa88 notifications@github.com wrote:
Isn't the one I attached up there in the previous message? Or you mean another one? Agnieszka
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/demis001/temposeqcount/issues/26#issuecomment-490781737, or mute the thread https://github.com/notifications/unsubscribe-auth/ACCPKKW3Y4C4J3UHCIJXG5LPUPHGDANCNFSM4GWM4GAA .
Dear Dereje,
For me it seems that the endings of the logs for successful and failing samples are different (see the attachments).
Under following WeTransfer link there is everything you need to try it on your side: link
Thanks a lot, Agnieszka
Hi Agnieszka,
I spent few hours to track the error. Here is what I found:
It looks like the scaling factor for STAR genome index didn't work for this library. Here is a quick fix. I will resolve this for the future update.
` deactivate
vim temposeqcount/tasks.py
'--genomeSAindexNbases', str(scale_factor),
'--genomeSAindexNbases', str(8),
make install
source tempseqcount/bin/activate
`
I will send you the count if you send me your actual email, I don't want to share the actual data here.
Let me know...
Dear Dereje,
It worked! It's truly awesome! I have fought with that for several weeks.
Don't bother sending me counts. Anyway I have plenty more samples to process.
So far I completed successfully run for 2 big FASTQ files + whole transcriptome manifest file. I will let you know how it works with the whole data set (24 big FASTQ files) soon.
For now I would like to thank you very, very much. I greatly appreciate that you spent so much time on troubleshooting.
Words of thanks once again, Best wishes, Agnieszka
Dear Dereje,
Little update - it also worked whole data set of 24 FASTQ files + manifest with 22.000 genes. Thanks! Agnieszka
Thank you for update, don't forget tor reference our paper in the future.
Best regards, Dereje
On Fri, May 10, 2019 at 8:25 AM Agpa88 notifications@github.com wrote:
Dear Dereje,
Little update - it also worked whole data set of 24 FASTQ files + manifest with 22.000 genes. Thanks! Agnieszka
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/demis001/temposeqcount/issues/26#issuecomment-491270563, or mute the thread https://github.com/notifications/unsubscribe-auth/ACCPKKTW2YHXXEEPADOOHG3PUVSSTANCNFSM4GWM4GAA .
Dear Temposeqcount Creator,
I am interested in using Temposeqcount application, however I encountered problem while executing it. I have 64 samples (fastq.gz files) to process, and whenever I run the script it seems that .fastqLog.final.txt files are not generated for all fastq.gz files (they are missing in /result/tempFolder). Consequently, resultDATA_alignment_summary.csv file contains statistics for only those fastq.gz files for which .fastqLog.final.txt files were generated. Moreover, resultDATA_COUNT_countcombined.csv contains zeros for all samples with missing .fastqLog.final.txt files (see the attachments).
Could it be that formats of input files are not correct? I am also attaching screenshot of manifest file so its format can be verified.
The strangest thing is, that whenever I run the script, I get results for only 1-2 samples and they are different at each time.
Thanks, Agnieszka