BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
205 stars 71 forks source link

failed at counting step for isoform read support [COLLAPSE] #204

Closed Drosofriends closed 2 years ago

Drosofriends commented 2 years ago

Hi everyone, I'm using FLAIR collapse, I just followed the instruction on github for the command options giving a single Fastq file with concatenated fastq of my samples of interest (8 samples), concatenated bed files etc..The collapse started but after one hour it stopped and returned this message :"failed at counting step for isoform read support". I excluded a storage memory problem because I try to use an HD for temporary files and output files. I have no idea of what kind of problem it could be.

Thanks!

Jeltje commented 2 years ago

I'm in the process of reorganizing the code, in part to deal with this "hide the scary errors" issue. I'm assuming you retrieved the code using git clone. To update, please run git pull and then git checkout develop. Flair has moved to src/flair/flair.py. This version should give you a proper error message, please post it here.

Drosofriends commented 2 years ago

I followed your instructions and this is the error message that I had: Traceback (most recent call last): File "flair.py", line 11, in from flair_correct import correct File "/home/*//FLAIR/flair/src/flair/flair_correct.py", line 95 print(" Adding other juncs, assuming file is %s" % "bed6" if strandCol == -1 else "STAR", file=fo) ^ SyntaxError: invalid syntax

Do I have to run again the correction with this "new" code?

Jeltje commented 2 years ago

Actually, you could try to run the original code without the --print_check flag, which is what causes the error.

I have also fixed it in the latest develop commit, so another git pull should allow you to run your original command.

Drosofriends commented 2 years ago

I tried to launch both the two options but I had the same error. In addition I also tried to run again the correction to test the updated flair.py script but I had the same error too. Also --help function doesn't work. Any other suggestions?

Jeltje commented 2 years ago

What is the exact command you are running?

Drosofriends commented 2 years ago

I used the git pull and run the command for the collapse using updated flair.py. The command for the collapse is: python /src/flair/flair.py collapse -- genome path/genome.fa --reads path/fastq files separated by space -- query path/corrected concatenated bed files --gtf path/gtf --threads 5 --support 3 --temp_dir --output /path output Actually, could you be more precise on how to run the original code without --print check flag?

Jeltje commented 2 years ago

I mean the actual command. It doesn't matter if it has your local paths. In your example it looks like you input -- genome (with a space), which is probably not what you actually do. And temp_dir needs an argument. So if I have the real command it's much easier to find out what's going wrong.

What version of python are you using (python --version)?

About the --print-check argument, just ignore that. I needed to set that to get an error similar to yours using the original code. It looks like that's not the problem.

Drosofriends commented 2 years ago

python home/Documents/flair/src/flair/flair.py collapse --genome home/Documents/ref_seq.fa --reads home/Documents/sample1.fastq home/Documents/sample2.fastq home/Documents/sample3.fastq home/Documents/sample4.fastq home/Documents/sample5.fastq home/Documents/sample6.fastq home/Documents/sample7.fastq home/Documents/sample8.fastq --query home/Documents/all_corrected.bed --gtf home/Documents/ref_seq.gtf --threads 5 --support 3 --temp_dir home/Documents/Elements --output home/Docments/Elements/all_collapsed (Elements is an external HardDisk support) the python version is 3.7.

Jeltje commented 2 years ago

Thanks! I did a similar run on my end (with python 3.7) and am having no issues. I have put some test data at https://hgwdev.gi.ucsc.edu/~jeltje/flairTest.tgz, could you un-tar it and run the original flair (not the development version) like so:

python <your_path_to>/flair.py collapse --genome test_ref/genome.fa --reads test_ref/reads.fq test_ref/reads.1.fq test_ref/reads.2.fq --query test_ref/test.corrected.bed --gtf test_ref/annotation.incomplete.gtf --threads 5 --support 3 --temp_dir <your_temp_dir> --output <your_output_dir>/all_collapsed

Does that give problems?

Drosofriends commented 2 years ago

Hi thanks for your help in giving to me the files for the test. I run your test and everything works. Next I tried to run my command and it partially works because I have the firstpass.bed file and the firstpass.fa file but I do not have the .gtf file. Instead of gtf output file I have unfiltered.bed and it keeps intermediate sam files. It returns the same error message "failed at counting step for isoform read support" why??? Everything works when I run one single sample (sample.fastq and its corrected file)

Thank you so much!

Jeltje commented 2 years ago

The step that fails is a command that looks something like this:

python flair/bin/count_sam_transcripts.py -s <temp_dir>/tmpzy2rpdiy.firstpass.sam -o <output_dir>/all_collapsed.firstpass.q.counts -t 5 --quality 1 -w 100

Can you run that with the firstpass.sam file in your temp_dir? What error do you get?

Drosofriends commented 2 years ago

It kills the process.. this is the command: python /home/Documents/FLAIR/flair_collapse/flair-master/bin/count_sam_transcripts.py --sam /home/Documents/Elements/tmpY3SqwR.firstpass.sam -o /home/Documents/Elements/all_collapsed.firstpass.q.counts -t 5 --quality 1 -w 100 If I try to run 3 samples it works but just with four it does not work (not depending on the sample type).

Jeltje commented 2 years ago

It sounds like your computer is running out of memory. The count_sam_transcripts.py program creates an entry for every input read so the larger your sam file becomes the more memory it needs.

I have made a note to check if we can do something about that but it will take a while.

In the meantime, if you don't have access to a more powerful computer you could try to split your input file by chromosome(s) and run separately for each batch. To split:

mkdir splitdir
for chr in $(cut -f1 test.corrected.bed | sort -u); do 
    grep -w "^$i test.corrected.bed > splitdir/$chr.bed
done

And then concatenate several chromosomes (especially the smaller ones) and run flair.py collapse once for each bed file. Make sure to give them unique output extensions (--output <your_output_dir>/all_collapsed_chrN) so the runs don't overwrite each other. You can then concatenate the final results.

Drosofriends commented 2 years ago

Hi Jeltje thank you for your advices, I am trying to solve memory problem. Meanwhile I have see in FLAIR description "other ways to run FLAIR modules" the possibility to overcome memory problem exploiting a beta version of the collapse module. Is it usefull in my case? How can I install bedParition? I have downloaded it but "no such file or directory" returns when I launch the command as shown in the github. Any tips? Thanks for your time!!

Jeltje commented 2 years ago

It may not actually be a memory issue after all but a temp_dir overload. Today's commit solves the problem of flair.py collapse creating a gigantic SAM file. This will likely solve your issue.

If you want to run with ranges, bedPartition can be downloaded as a compiled executable for linux x64 here. Builds for a few other systems are one level up.

Drosofriends commented 2 years ago

I try to run the command with the modified script, I have no SAM files this time but I still have the same error "failed at counting step for isoform read support" It is like the problem is strickly related to limited power of my computer (32GB RAM 1T SSD).

I have downloaded bedPartition executable file from your link but I still have no such file or directory as error. Very many thanks for your support!

Jeltje commented 2 years ago

Yes, a regular laptop or desktop is not powerful enough to run flair at this moment. You can verify that it's a memory issue by running htop while flair is running and looking at the Mem bar.

Did you put bedPartition in your $PATH (e.g. ~/home/bin) and make it executable (chmod 775 bedPartition) ?

Drosofriends commented 2 years ago

Thanks for your help Jeltje , I solved my problem using a server. So I can finally close the issue. Best regards