Closed TTT16 closed 6 months ago
This usually happens when the chromosome names in your genome file do not match the ones in the annotation gtf. This seems unlikely in your case (judging by the names they were retrieved from the same source), but please verify by making sure that GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
has a similar name structure (chr1 or just 1) as GCA_000001405.15_GRCh38_full_analysis_set.refseq_annotation.gtf
(first field).
They have the same chr1
(/cm/shared/apps/flair/2.0.0) [tmhtxt53@hmrihpcp03 GCA_000001405.15_GRCh38_noALT]$ head GCA_000001405.15_GRCh38_full_analysis_set.refseq_annotation.gtf
chr1 BestRefSeq gene 11874 14409 . + . gene_id "DDX11L1"; transcript_id ""; db_xref "GeneID:100287102"; db_xref "HGNC:HGNC:37102"; description "DEAD/H-box helicase 11 like 1 (pseudogene)"; gbkey "Gene"; gene "DDX11L1"; gene_biotype "transcribed_pseudogene"; pseudo "true"; chr1 BestRefSeq transcript 11874 14409 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; db_xref "GeneID:100287102"; gbkey "misc_RNA"; gene "DDX11L1"; product "DEAD/H-box helicase 11 like 1 (pseudogene)"; pseudo "true"; transcript_biotype "transcript"; chr1 BestRefSeq exon 11874 12227 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; db_xref "GeneID:100287102"; gene "DDX11L1"; product "DEAD/H-box helicase 11 like 1 (pseudogene)"; pseudo "true"; transcript_biotype "transcript"; exon_number "1"; chr1 BestRefSeq exon 12613 12721 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; db_xref "GeneID:100287102"; gene "DDX11L1"; product "DEAD/H-box helicase 11 like 1 (pseudogene)"; pseudo "true"; transcript_biotype "transcript"; exon_number "2"; chr1 BestRefSeq exon 13221 14409 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; db_xref "GeneID:100287102"; gene "DDX11L1"; product "DEAD/H-box helicase 11 like 1 (pseudogene)"; pseudo "true"; transcript_biotype "transcript"; exon_number "3";
(/cm/shared/apps/flair/2.0.0) [tmhtxt53@hmrihpcp03 GCA_000001405.15_GRCh38_noALT]$ head GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome M5:6aef897c3d6ff0c78aff06ac189178dd AS:GRCh38 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
hi, they have the same "chr1", so the issue(s) must come from something else, right?
(/cm/shared/apps/flair/2.0.0) @.*** GCA_000001405.15_GRCh38_noALT]$ head GCA_000001405.15_GRCh38_full_analysis_set.refseq_annotation.gtf
chr1 BestRefSeq gene 11874 14409 . + . gene_id "DDX11L1"; transcript_id ""; db_xref "GeneID:100287102"; db_xref "HGNC:HGNC:37102"; description "DEAD/H-box helicase 11 like 1 (pseudogene)"; gbkey "Gene"; gene "DDX11L1"; gene_biotype "transcribed_pseudogene"; pseudo "true";
chr1 BestRefSeq transcript 11874 14409 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; db_xref "GeneID:100287102"; gbkey "misc_RNA"; gene "DDX11L1"; product "DEAD/H-box helicase 11 like 1 (pseudogene)"; pseudo "true"; transcript_biotype "transcript";
chr1 BestRefSeq exon 11874 12227 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; db_xref "GeneID:100287102"; gene "DDX11L1"; product "DEAD/H-box helicase 11 like 1 (pseudogene)"; pseudo "true"; transcript_biotype "transcript"; exon_number "1";
chr1 BestRefSeq exon 12613 12721 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; db_xref "GeneID:100287102"; gene "DDX11L1"; product "DEAD/H-box helicase 11 like 1 (pseudogene)"; pseudo "true"; transcript_biotype "transcript"; exon_number "2";
chr1 BestRefSeq exon 13221 14409 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; db_xref "GeneID:100287102"; gene "DDX11L1"; product "DEAD/H-box helicase 11 like 1 (pseudogene)"; pseudo "true"; transcript_biotype "transcript"; exon_number "3";
(/cm/shared/apps/flair/2.0.0) @.*** GCA_000001405.15_GRCh38_noALT]$ head GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome M5:6aef897c3d6ff0c78aff06ac189178dd AS:GRCh38
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
On Tue, Feb 20, 2024 at 1:31 PM Jeltje @.***> wrote:
This usually happens when the chromosome names in your genome file do not match the ones in the annotation gtf. This seems unlikely in your case (judging by the names they were retrieved from the same source), but please verify by making sure that GCA_000001405.15_GRCh38_no_alt_analysis_set.fna has a similar name structure (chr1 or just 1) as GCA_000001405.15_GRCh38_full_analysis_set.refseq_annotation.gtf (first field).
— Reply to this email directly, view it on GitHub https://github.com/BrooksLabUCSC/flair/issues/327#issuecomment-1954918805, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALAPWNWBA32G6FFCUEEIGILYUT2ZJAVCNFSM6AAAAABDRN7UJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJUHEYTQOBQGU . You are receiving this because you authored the thread.Message ID: @.***>
Thanks for checking that. What are the first few lines of cannot_verify.bed?
Here is the content of cannot_verify.bed. What should I do next? Thanks!
@.*** flair4]$ head TotalRNA-HEKWT_0h_FAL06539_A-flair.output_cannot_verify.bed
chr11_KI270721v1_random 6092 11780 5e46e250-5d5f-4e18-a158-9733aa379d3c 1 + 6092 11780 27,158,119 4 124,83,74,330, 0,1229,1884,5358,
chr11_KI270721v1_random 6097 11795 b36c537e-3eb0-4244-9d23-e7faf23bde40 1 + 6097 11795 27,158,119 4 119,83,78,345, 0,1224,1879,5353,
chr11_KI270721v1_random 6139 11792 ffc52e08-b645-4fb8-a370-2401e33b9c79 1 + 6139 11792 27,158,119 5 77,83,74,171,342, 0,1182,1837,4666,5311,
chr22_KI270734v1_random 131503 137382 1167e0e4-1218-4811-b90f-fc67f643abba 8
chr22_KI270734v1_random 131600 137395 fde8dff6-0163-4202-9e6c-783c3c14c4f1 11 + 131600 137395 27,158,119 5 155,161,101,141,551, 0,235,3842,4558,5244,
chr22_KI270734v1_random 131603 137393 a1166575-39c9-4228-b27a-1357e8cb378c 1
chr22_KI270734v1_random 131612 137386 2c02365c-972f-46db-b3d1-63b0bec43592 12 + 131612 137386 27,158,119 5 143,161,101,141,542, 0,223,3830,4546,5232,
chr22_KI270734v1_random 131912 137386 7b6f44ec-4da7-4a32-ac8f-619968bb8597 8
chr22_KI270734v1_random 135513 137393 fb3bf38e-48a4-47b2-9097-8aac6957c83c 3
chr22_KI270734v1_random 136156 137387 4783b28a-77e9-4116-9d78-cb9a3be138e2 1
On Wed, Feb 21, 2024 at 10:14 AM Jeltje @.***> wrote:
Thanks for checking that. What are the first few lines of cannot_verify.bed?
— Reply to this email directly, view it on GitHub https://github.com/BrooksLabUCSC/flair/issues/327#issuecomment-1957151571, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALAPWNV7656EMAX7CDH45S3YUYMNVAVCNFSM6AAAAABDRN7UJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJXGE2TCNJXGE . You are receiving this because you authored the thread.Message ID: @.***>
There's something odd about your input arguments:
-m
is no longer an option in Flair 2.0, and when it was in previous versions it needed an argument.
--nvrna
needs 2 dashes or it will be ignored
--annotated_bed
needs an argument or the program will die in the collapse step. You don't actually need --annotated_bed
because you're already giving it a gtf input and Flair can use that instead.
Are you using Flair 2.0?
So, how my command would be? Could you please suggest?
And I don't know if it is FLAIR 1 or 2.0 since the systems architect in my institution installed it. How can we check the version of flair on HPC? And if possible, could you please send me the github link to FLAIR 2?
Thank you very much for your help, Best, Trinh
On Wed, Feb 21, 2024 at 11:02 AM Jeltje @.***> wrote:
There's something odd about your input arguments: -m is no longer an option in Flair 2.0, and when it was in previous versions it needed an argument. --nvrna needs 2 dashes or it will be ignored --annotated_bed needs an argument or the program will die in the collapse step. You don't actually need --annotated_bed because you're already giving it a gtf input and Flair can use that instead.
Are you using Flair 2.0?
— Reply to this email directly, view it on GitHub https://github.com/BrooksLabUCSC/flair/issues/327#issuecomment-1957315199, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALAPWNWKNENATIYWBZGCPCDYUYSA5AVCNFSM6AAAAABDRN7UJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJXGMYTKMJZHE . You are receiving this because you authored the thread.Message ID: @.***>
We checked and the version of flair on our HPC is FLAIR v2.0.0. Could you please suggest the correct command, and what are the human genome.fa and annotation.gtf files we should use for hg38?
I have 10 samples including 4 conditions (2 reps, 3 reps, 2 reps, 3 reps), should I concatenate the output files before or after running flair collapse?
And to run flair quantify, for the input file (isoform.fa), do I need to concatenate all 10 isoform.fa from 10 samples, or just use 1 sample only?
Thank you very much for your help,
Best,
Trinh
I think I see the problem. You're running Flair sample by sample and that's not what you want.
It's a lot easier if you separate the steps, something like this (please check this, don't just copy):
flair align \
-g $ref/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \
-t 30 \
-r $input_dir/$sample1.fq.gz,$input_dir/$sample2.fq.gz,$input_dir/$sample3.fq.gz \
--nvrna \
-o flairout/aligned
flair correct \
-g $ref/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \
-t 30 \
--query flairout/aligned.bed \
-f $ref/GCA_000001405.15_GRCh38_full_analysis_set.refseq_annotation.gtf \
--nvrna \
-o flairout/correct
Please note there's no need to give it 1024 CPUs, it's not going to be faster. You will then end up with a large output that needs to be split before submitting to collapse. I'm just working on that, so I'll post again shortly.
Hi, I tried to run separate steps of flair, and got these output files. I am not quite sure which one(s) will be helpful or contain what information. Which files I should use for next step? Could you please help? Thank you!
[tmhtxt53@hmrihpcp03 flair6_SR]$ ls chrom_corrected flair6.collapse.annotated_transcripts.isoform.read.map.txt flair6.collapse.isoforms.gtf flair6.aligned.bam flair6.collapse.annotated_transcripts.supported.bed flair6.collapse.unassigned.bed flair6.aligned.bam.bai flair6.collapse.combined.isoform.read.map.txt flair6.collapse.unassigned.fasta flair6.aligned.bed flair6.collapse.firstpass.bed flair6.corrected_all_corrected.bed flair6.collapse.annotated_transcripts.alignment.counts flair6.collapse.firstpass.q.counts flair6.corrected_all_inconsistent.bed flair6.collapse.annotated_transcripts.alignment.mm2_stderr flair6.collapse.firstpass.unfiltered.bed flair6.corrected_cannot_verify.bed flair6.collapse.annotated_transcripts.alignment.sam flair6.collapse.isoform.read.map.txt temp_dir flair6.collapse.annotated_transcripts.bed flair6.collapse.isoforms.bed flair6.collapse.annotated_transcripts.fa flair6.collapse.isoforms.fa
It looks like you ran all three steps (align, correct, and collapse), so you should be finished. Your results should be flair6.collapse.isoforms.bed, and flair6.collapse.isoforms.fa, the files you couldn't find after your original run.
Hi, Here are all the output files I got:
chrom_corrected flair6.collapse.annotated_transcripts.bed flair6.collapse.firstpass.unfiltered.bed flair6.corrected_all_corrected.bed
flair6.aligned.bam flair6.collapse.annotated_transcripts.fa flair6.collapse.isoform.read.map.txt flair6.corrected_all_inconsistent.bed
flair6.aligned.bam.bai flair6.collapse.annotated_transcripts.isoform.read.map.txt flair6.collapse.isoforms.bed flair6.corrected_cannot_verify.bed
flair6.aligned.bed flair6.collapse.annotated_transcripts.supported.bed flair6.collapse.isoforms.fa flair_diffExp
flair6.collapse.annotated_transcripts.alignment.counts flair6.collapse.combined.isoform.read.map.txt flair6.collapse.isoforms.gtf flair.quantify.counts.tsv
flair6.collapse.annotated_transcripts.alignment.mm2_stderr flair6.collapse.firstpass.bed flair6.collapse.unassigned.bed temp_dir
flair6.collapse.annotated_transcripts.alignment.sam flair6.collapse.firstpass.q.counts flair6.collapse.unassigned.fasta
I am not sure which files I should use for next steps (flair_diffExp), could you please help?
And, I also could not find where to download the flair_diffExp. Could you please send me the github link to download it?
Thank you very much for your help.
Best,
Trinh
On Mon, Mar 4, 2024 at 2:16 PM Jeltje @.***> wrote:
It looks like you ran all three steps (align, correct, and collapse), so you should be finished. Your results should be flair6.collapse.isoforms.bed, and flair6.collapse.isoforms.fa, the files you couldn't find after your original run.
— Reply to this email directly, view it on GitHub https://github.com/BrooksLabUCSC/flair/issues/327#issuecomment-1977380444, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALAPWNQWBZIB4HJXAUT4MHLYWTJANAVCNFSM6AAAAABDRN7UJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZXGM4DANBUGQ . You are receiving this because you authored the thread.Message ID: @.***>
Please follow the instructions at https://flair.readthedocs.io/en/latest/
Copy and paste the exact command you tried to run
SBATCH --mail-type=END,FAIL
SBATCH --mail-user=ttat@houstonmethodist.org
SBATCH --error=/condo/kisslab/tmhtxt53/GridIon/SLURM_OUT/%j.err
SBATCH --output=/condo/kisslab/tmhtxt53/GridIon/SLURM_OUT/%j.out
#######################################################
pwd; hostname; date
# module load mamba
conda activate
module load flair module list # sample_file="/condo/kisslab/tmhtxt53/GridIon/ready_analyzed_samples/sample_names.txt" parent_dir="/condo/kisslab/tmhtxt53/GridIon/ready_analyzed_samples" input_dir="/condo/kisslab/tmhtxt53/GridIon/ready_analyzed_samples/curated-10Samples_fastq" ref="/scratch2/ref/GCA_000001405.15_GRCh38_noALT" ############################################## # cd $parent_dir pwd=$(pwd) mkdir -p flair4_quantify # while IFS= read -r sample; do # echo sample is $sample cd "$sample" || { echo "Error: Unable to enter the folder $sample_name"; exit 1; } echo we are in sample folder pwd # sample_pwd=$(pwd) # folder_name="2023*" full_folder_name=$(find . -maxdepth 1 -type d -name "$folder_name" -print -quit) # inner_folder=$(echo $full_folder_name | awk -F '[/]' '{print $2}') echo "inner folder is: $inner_folder" # cd $sample_pwd/$inner_folder echo we are inside sample and inner folder pwd
done < "$sample_file" # ENDTIME=$(date +%s) echo "Final End Time is $(date)" echo "It took ($ENDTIME-$STARTTIME) secs" exit 0
What happened?
Hi, I tried to run flair as the code above, and I DID NOT get the output files "...output.isoform.bed", "...output.isoform.fa", and "...output.isoform.gtf", instead of that, we got "...output_cannot_verify.bed"
[tmhtxt53@hmrihpcp03 flair4]$ ls TotalRNA-HEKWT_0h_FAL06539_A-flair.output_all_corrected.bed TotalRNA-HEKWT_0h_FAL06539_A-flair.output_all_inconsistent.bed TotalRNA-HEKWT_0h_FAL06539_A-flair.output.bam TotalRNA-HEKWT_0h_FAL06539_A-flair.output.bam.bai TotalRNA-HEKWT_0h_FAL06539_A-flair.output.bed TotalRNA-HEKWT_0h_FAL06539_A-flair.output_cannot_verify.bed
What is wrong with my code? Thank you, Trinh Please replace this text with the whole error message