Closed jakobht96 closed 7 months ago
I think what you're saying is that you ran flair collapse on separate samples, and then concatenated the output. If you do it that way you will very likely have duplicate isoforms in there, with different names.
Instead, you should be able to run collapse on separate chromosomes by doing something like
for chrom in $(cut -f1 corrected.bed | sort -u); \
do grep "^$chrom" corrected.bed > $chrom.corrected.bed; \
grep "^$chrom" annotation.gtf > $chrom.annot.gtf;
done
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/faSplit && chmod 775 faSplit
./faSplit byname genome.fa ./
and then run flair collapse --gtf $chrom.annot.gtf --reads reads1.fq,reads2.fq(..),reads18.fq --query $chrom.corrected.bed --genome $chrom.fa
You can then run quantify also per chromosome and concatenate the outputs (be sure to remove the header lines).
If that works, please close this ticket. If not, please let us know what happens.
Thank you, I will try it immediately 😄 I will return when it has been processed.
Hi Jeltje, Could you please clarify what annotation.gtf, $chrom.annot.gtf, genome.fa which you used in above commands? I am working with hg38. Thank you, Trinh
I did not work yet. I have made a mistake that only used chrom1 in the script. I need to revise my command.
This worked perfectly! Thanks a lot 😄
However, the genes of (my) interest are not in the count.tsv file anymore. They are present when samples are run seperately. Is this because when run together the needed evidence of an isoform needs to become greater compared to flair run seperately?
(I have tried grep for the genes of interest but was unable to find them)
If you wish me to open another ticket for my issue, I will do that.
Yes, please. I am still struggling of making the flair scripts work.
I got these output files. Not sure which files I should use for next steps. Could you please explain? Thanks!
@.*** flair6_SR]$ ls
chrom_corrected flair6.collapse.annotated_transcripts.isoform.read.map.txt flair6.collapse.isoforms.gtf
flair6.aligned.bam flair6.collapse.annotated_transcripts.supported.bed flair6.collapse.unassigned.bed
flair6.aligned.bam.bai flair6.collapse.combined.isoform.read.map.txt flair6.collapse.unassigned.fasta
flair6.aligned.bed flair6.collapse.firstpass.bed flair6.corrected_all_corrected.bed
flair6.collapse.annotated_transcripts.alignment.counts flair6.collapse.firstpass.q.counts flair6.corrected_all_inconsistent.bed
flair6.collapse.annotated_transcripts.alignment.mm2_stderr flair6.collapse.firstpass.unfiltered.bed flair6.corrected_cannot_verify.bed
flair6.collapse.annotated_transcripts.alignment.sam flair6.collapse.isoform.read.map.txt temp_dir
flair6.collapse.annotated_transcripts.bed flair6.collapse.isoforms.bed
flair6.collapse.annotated_transcripts.fa flair6.collapse.isoforms.fa
On Mon, Feb 26, 2024 at 1:48 PM jakobht96 @.***> wrote:
If you wish me to open another ticket for my issue, I will do that.
— Reply to this email directly, view it on GitHub https://github.com/BrooksLabUCSC/flair/issues/324#issuecomment-1965131094, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALAPWNTPZKQ337DBXQV62ULYVTRJTAVCNFSM6AAAAABDGDSXDSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRVGEZTCMBZGQ . You are receiving this because you commented.Message ID: @.***>
Okay, I am trying something here.
for chrom in $(cut -f1 corrected.bed | sort -u); \
do grep "^$chrom" corrected.bed > $chrom.corrected.bed; \
grep "^$chrom" annotation.gtf > $chrom.annot.gtf;
done
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/faSplit && chmod 775 faSplit
./faSplit byname genome.fa ./
The flair collapse command didn't work as expected. So I was changing to a for loop instead:
for chrom in $(cut -f1 corrected.bed | sort -u)
do
flair collapse --gtf $chrom.annot.gtf --reads reads1.fq,reads2.fq(..),reads18.fq --query $chrom.corrected.bed --genome $chrom.fa
done _<optionals>_
I am no expert, so I don't know if I could have avoided the for-loop but, hey, if it works 🤷
Thanks. I will close this ticket when I know the result from my run.
Because I am working on a seperate server, I have to use nohup to make the script work in the background. However, nohup does not work with the for loop. Therefore, I made a .sh script and executed that as nohup.
It worked! I will close this ticket now :)
We're in the middle of refactoring flair collapse to make all of this a lot easier!
I get an error during flair quantify. I run the following code:
flair quantify -r ./quantify/read_manifest.tsv -i ./quantify/all_samples.isoforms.fa -o ./quantify/all_samples --threads 38 --sample_id_only --trust_ends --generate_map
Installation
Flair was installed in a conda env using bioconda.
Error
The all_samples.isoforms.fa is a concatenated fasta file of fasta files from 18 samples. (I had to run seperately because of the 1GB limit in collapse). I am able to align, but quantifying isoforms fails with following text:
Step 2/3. Quantifying isoforms for sample Sample01_isoseq: 1/18 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/t0md/miniconda3/envs/flair/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/home/t0md/miniconda3/envs/flair/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/home/t0md/miniconda3/envs/flair/lib/python3.10/site-packages/flair/count_sam_transcripts.py", line 243, in count_transcripts_for_reads transcript_lengths[t] - read_right, softclip_left, softclip_right, transcripts[t].mapq) KeyError: 'm64463e_230824_062930/127077064/ccs_NC_060935.1:90178000' """
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/t0md/miniconda3/envs/flair/lib/python3.10/site-packages/flair/count_sam_transcripts.py", line 369, in
counts = p.map(count_transcripts_for_reads, grouped_reads)
File "/home/t0md/miniconda3/envs/flair/lib/python3.10/multiprocessing/pool.py", line 367, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/t0md/miniconda3/envs/flair/lib/python3.10/multiprocessing/pool.py", line 774, in get
raise self._value
KeyError: 'm64463e_230824_062930/127077064/ccs_NC_060935.1:90178000'
Notes
First, I tried to run only two samples as a test, which worked perfectly. Are the flair quantify also limited to 1 GB file? My all_samples.isoforms.fa is 8.8 GB. The read manifest looks like this: read_manifest.txt
I hope you can help 😃