Closed cloudyrainy-zh closed 1 week ago
Hello! Thank you for your question.
For successful execution of the InPACT_quantify
step, several key files are indispensable:
InPACT
command.InPACT_transcript
command.Salmon
, this file is essential for downstream analyses. It is important to note that the index for the transcriptome used in Salmon
should be constructed based on the merged.gtf file.To ensure the smooth execution of the InPACT_quantify
command, it is recommended to meticulously verify the integrity and compatibility of these input files.
Thank you, Liu
Hello, I encountered the same issue. The first step outputted a file named predict.result.txt, and the second step generated a merged.gtf. I believe these two steps should be correct, but I also encountered an error in the third step:
Traceback (most recent call last):
File "/root/miniconda3/envs/InPACT/bin/InPACT_quantify", line 103, in
No matter which one I use, I get the same error!!! Where could the problem be? Therefore, I want to first confirm the format of the quant.sf file, which is very important. Secondly, please help me check if there is any issue with the merged.gtf.
The merged.gtf
was generated from hg38.fa
, using other fa
files would cause errors.
chr1 ncbiRefSeq.2021-12-08 transcript 33306766 33321098 . - . gene_id "A3GALT2"; transcript_id "NM_001080438.1"; gene_name "A3GALT2";
chr1 ncbiRefSeq.2021-12-08 exon 33306766 33307453 . - . gene_id "A3GALT2"; transcript_id "NM_001080438.1"; exon_number "5"; exon_id "NM_001080438.1.5"; gene_name "A3GALT2";
chr1 ncbiRefSeq.2021-12-08 stop_codon 33306766 33306768 . - 0 gene_id "A3GALT2"; transcript_id "NM_001080438.1"; exon_number "5"; exon_id "NM_001080438.1.5"; gene_name "A3GALT2";
chr1 ncbiRefSeq.2021-12-08 CDS 33306769 33307453 . - 1 gene_id "A3GALT2"; transcript_id "NM_001080438.1"; exon_number "5"; exon_id "NM_001080438.1.5"; gene_name "A3GALT2";
Thank you very much.
The error likely stems from an issue during the Salmon run. To resolve it, follow these steps:
First, ensure you have a set of target transcripts for quantification. Extract the transcript sequences from the merged.gtf
file using gffread.
gffread merged.gtf -g <genome.fa> -w transcripts.fa
Next, build the transcriptome index with Salmon.
salmon index -t trasncripts.fa -i <index> -k 31
After building the index, you can quantify against this index using Salmon for accurate quantification.
I think this process should help avoid errors.
In the response mentioned at https://github.com/YY-TMU/InPACT/issues/10, I used the hg38.fa file. I attempted the step with the command "gffread merged.gtf -g hg38.fa -w transcripts.fa", during which the following error occurred:
Warning: couldn't find fasta record for 'chr10_KN196480v1_fix'! Error: no genomic sequence available (check -g option!). Despite the error, I reviewed the output results. Here is the head of the transcripts.fa file:
NM_001005484.2 CDS=61-1041 CCCAGATCTCTTCAGTTTTTATGCCTCATTCTGTGAAAATTGCTGTAGTCTCTTCCAGTTATGAAGAAGG TAACTGCAGAGGCTATTTCCTGGAATGAATCAACGAGTGAAACGAATAACTCTATGGTGACTGAATTCAT ... I decided to use this erroneous result to try subsequent operations and build the index. Here is the head of the quant.sf file:
Name Length EffectiveLength TPM NumReads NM_001005484.2 2618 2326.487 0.000000 0.000 NM_001005221.2 939 647.817 0.000000 0.000 ... The name format of the other transcript output results is consistent:
Name Length EffectiveLength TPM NumReads NM_001080438.1 248956422 248956133.3 66.538614 2386248.409 NM_001080438.1 133797422 133797133.3 43.224636 833100.172 NM_001080438.1 135086622 135086333.3 73.906931 1438187.977 The result is still the same error:
Traceback (most recent call last):
File "/root/miniconda3/envs/InPACT/bin/InPACT_quantify", line 103, in
Thank you.
We were unable to reproduce the error with gffread on our end. I recommend trying the command with different annotation files to verify the command. If the problem persists, feel free to send your complete code and files to my email (liuxc1995@tmu.edu.cn), and we’ll be happy to assist with troubleshooting. And the final ipa_usage.txt format is in README part.
In the third step, InPACT_quantify is running with the following error, and I would like to know which file is at fault
Traceback (most recent call last): File "/home/data/miniconda3/envs/InPACT/bin/InPACT_quantify", line 103, in
ipa_usage_dic = calculate_usage(transcript_tpm_path,
File "/home/data/miniconda3/envs/InPACT/bin/InPACT_quantify", line 48, in calculate_usage
gene_tpm_dic[gn] = sum([transcript_tpm_dic[transcript] for transcript in transcripts])
File "/home/data/miniconda3/envs/InPACT/bin/InPACT_quantify", line 48, in
gene_tpm_dic[gn] = sum([transcript_tpm_dic[transcript] for transcript in transcripts])
KeyError: 'NM_000350.3'
Thank you very much.