BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
207 stars 71 forks source link

error in flair quantify #324

Closed jakobht96 closed 7 months ago

jakobht96 commented 7 months ago

I get an error during flair quantify. I run the following code:

flair quantify -r ./quantify/read_manifest.tsv -i ./quantify/all_samples.isoforms.fa -o ./quantify/all_samples --threads 38 --sample_id_only --trust_ends --generate_map

Installation

Flair was installed in a conda env using bioconda.

Error

The all_samples.isoforms.fa is a concatenated fasta file of fasta files from 18 samples. (I had to run seperately because of the 1GB limit in collapse). I am able to align, but quantifying isoforms fails with following text:

Step 2/3. Quantifying isoforms for sample Sample01_isoseq: 1/18 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/t0md/miniconda3/envs/flair/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/home/t0md/miniconda3/envs/flair/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/home/t0md/miniconda3/envs/flair/lib/python3.10/site-packages/flair/count_sam_transcripts.py", line 243, in count_transcripts_for_reads transcript_lengths[t] - read_right, softclip_left, softclip_right, transcripts[t].mapq) KeyError: 'm64463e_230824_062930/127077064/ccs_NC_060935.1:90178000' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/t0md/miniconda3/envs/flair/lib/python3.10/site-packages/flair/count_sam_transcripts.py", line 369, in counts = p.map(count_transcripts_for_reads, grouped_reads) File "/home/t0md/miniconda3/envs/flair/lib/python3.10/multiprocessing/pool.py", line 367, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/t0md/miniconda3/envs/flair/lib/python3.10/multiprocessing/pool.py", line 774, in get raise self._value KeyError: 'm64463e_230824_062930/127077064/ccs_NC_060935.1:90178000'

Notes

First, I tried to run only two samples as a test, which worked perfectly. Are the flair quantify also limited to 1 GB file? My all_samples.isoforms.fa is 8.8 GB. The read manifest looks like this: read_manifest.txt

I hope you can help 😃

Jeltje commented 7 months ago

I think what you're saying is that you ran flair collapse on separate samples, and then concatenated the output. If you do it that way you will very likely have duplicate isoforms in there, with different names.

Instead, you should be able to run collapse on separate chromosomes by doing something like

for chrom in $(cut -f1 corrected.bed | sort -u); \
    do grep "^$chrom" corrected.bed > $chrom.corrected.bed; \
    grep "^$chrom" annotation.gtf > $chrom.annot.gtf;
done
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/faSplit && chmod 775 faSplit
./faSplit byname genome.fa ./

and then run flair collapse --gtf $chrom.annot.gtf --reads reads1.fq,reads2.fq(..),reads18.fq --query $chrom.corrected.bed --genome $chrom.fa

You can then run quantify also per chromosome and concatenate the outputs (be sure to remove the header lines).

If that works, please close this ticket. If not, please let us know what happens.

jakobht96 commented 7 months ago

Thank you, I will try it immediately 😄 I will return when it has been processed.

TTT16 commented 7 months ago

Hi Jeltje, Could you please clarify what annotation.gtf, $chrom.annot.gtf, genome.fa which you used in above commands? I am working with hg38. Thank you, Trinh

jakobht96 commented 7 months ago

EDIT

I did not work yet. I have made a mistake that only used chrom1 in the script. I need to revise my command.

Original

This worked perfectly! Thanks a lot 😄

However, the genes of (my) interest are not in the count.tsv file anymore. They are present when samples are run seperately. Is this because when run together the needed evidence of an isoform needs to become greater compared to flair run seperately?

(I have tried grep for the genes of interest but was unable to find them)

jakobht96 commented 7 months ago

If you wish me to open another ticket for my issue, I will do that.

TTT16 commented 7 months ago

Yes, please. I am still struggling of making the flair scripts work.

I got these output files. Not sure which files I should use for next steps. Could you please explain? Thanks!

@.*** flair6_SR]$ ls

chrom_corrected flair6.collapse.annotated_transcripts.isoform.read.map.txt flair6.collapse.isoforms.gtf

flair6.aligned.bam flair6.collapse.annotated_transcripts.supported.bed flair6.collapse.unassigned.bed

flair6.aligned.bam.bai flair6.collapse.combined.isoform.read.map.txt flair6.collapse.unassigned.fasta

flair6.aligned.bed flair6.collapse.firstpass.bed flair6.corrected_all_corrected.bed

flair6.collapse.annotated_transcripts.alignment.counts flair6.collapse.firstpass.q.counts flair6.corrected_all_inconsistent.bed

flair6.collapse.annotated_transcripts.alignment.mm2_stderr flair6.collapse.firstpass.unfiltered.bed flair6.corrected_cannot_verify.bed

flair6.collapse.annotated_transcripts.alignment.sam flair6.collapse.isoform.read.map.txt temp_dir

flair6.collapse.annotated_transcripts.bed flair6.collapse.isoforms.bed

flair6.collapse.annotated_transcripts.fa flair6.collapse.isoforms.fa

On Mon, Feb 26, 2024 at 1:48 PM jakobht96 @.***> wrote:

If you wish me to open another ticket for my issue, I will do that.

— Reply to this email directly, view it on GitHub https://github.com/BrooksLabUCSC/flair/issues/324#issuecomment-1965131094, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALAPWNTPZKQ337DBXQV62ULYVTRJTAVCNFSM6AAAAABDGDSXDSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRVGEZTCMBZGQ . You are receiving this because you commented.Message ID: @.***>

jakobht96 commented 7 months ago

Okay, I am trying something here.

This worked

for chrom in $(cut -f1 corrected.bed | sort -u); \
    do grep "^$chrom" corrected.bed > $chrom.corrected.bed; \
    grep "^$chrom" annotation.gtf > $chrom.annot.gtf;
done
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/faSplit && chmod 775 faSplit
./faSplit byname genome.fa ./

However

The flair collapse command didn't work as expected. So I was changing to a for loop instead:


for chrom in $(cut -f1 corrected.bed | sort -u)
    do
    flair collapse --gtf $chrom.annot.gtf --reads reads1.fq,reads2.fq(..),reads18.fq --query $chrom.corrected.bed --genome $chrom.fa
done _<optionals>_

I am no expert, so I don't know if I could have avoided the for-loop but, hey, if it works 🤷

Thanks. I will close this ticket when I know the result from my run.

addtional info for beginners as myself

Because I am working on a seperate server, I have to use nohup to make the script work in the background. However, nohup does not work with the for loop. Therefore, I made a .sh script and executed that as nohup.

jakobht96 commented 7 months ago

It worked! I will close this ticket now :)

Jeltje commented 7 months ago

We're in the middle of refactoring flair collapse to make all of this a lot easier!