ConesaLab / SQANTI3

Tool for the Quality Control of Long-Read Defined Transcriptomes
GNU General Public License v3.0
198 stars 49 forks source link

Error generating Kallisto index #257

Closed Upendra19993 closed 8 months ago

Upendra19993 commented 8 months ago

Hi,

I want to run sqanti3 for my dataset. But to get familiar with the tool, I first tried the tool with the example dataset you have provided. I ran sqanti3 quality control step and I am getting an error. The whole message I get is as below.

The command I used is: sqanti3_qc.py UHR_chr22.gtf gencode.v38.basic_chr22.gtf GRCh38.p13_chr22.fasta -o UHR_chr22 -d SQANTI3_output --short_reads UHR_chr22_short_reads.fofn --cpus 4 --report both

The progress of the job and error messages are as below.

(base) [uqwwijes@bun025 SQANTI3_reinstallation_2]$ sqanti3_qc.py UHR_chr22.gtf gencode.v38.basic_chr22.gtf GRCh38.p13_chr22.fasta -o UHR_chr22 -d SQANTI3_output --short_reads UHR_chr22_short_reads.fofn --cpus 4 --report both Rscript (R) version 4.3.1 (2023-06-16) ERROR: genome fasta /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_reinstallation_2/GRCh38.p13_chr22.fasta doesn't exist. Abort! (base) [uqwwijes@bun025 SQANTI3_reinstallation_2]$ cd .. (base) [uqwwijes@bun025 Exampla_data]$ sqanti3_qc.py UHR_chr22.gtf gencode.v38.basic_chr22.gtf GRCh38.p13_chr22.fasta -o UHR_chr22 -d SQANTI3_output --short_reads UHR_chr22_short_reads.fofn --cpus 4 --report both Rscript (R) version 4.3.1 (2023-06-16) Write arguments to /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/UHR_chr22.params.txt... Running SQANTI3... Parsing provided files.... Reading genome fasta /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/GRCh38.p13_chr22.fasta.... Skipping aligning of sequences because GTF file was provided.

Indels will be not calculated since you ran SQANTI3 without alignment step (SQANTI3 with gtf format as transcriptome input). Predicting ORF sequences... Parsing Reference Transcriptome.... Parsing Isoforms.... Running STAR for calculating Short-Read Coverage. START running STAR... Running indexing... /sw/local/rocky8/noarch/qcif/software/miniconda3/envs/sqanti3_5.2/bin/STAR-avx2 --runThreadN 4 --runMode genomeGenerate --genomeDir /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_index/ --genomeFastaFiles /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/GRCh38.p13_chr22.fasta --outTmpDir /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_index//_STARtmp/ STAR version: 2.7.11b compiled: 2024-01-29T15:15:38+0000 :/opt/conda/conda-bld/star_1706541070242/work/source Feb 27 12:20:52 ..... started STAR run Feb 27 12:20:52 ... starting to generate Genome files !!!!! WARNING: --genomeSAindexNbases 14 is too large for the genome size=50818468, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 11 Feb 27 12:20:53 ... starting to sort Suffix Array. This may take a long time... Feb 27 12:20:53 ... sorting Suffix Array chunks and saving them to disk... Feb 27 12:21:02 ... loading chunks from disk, packing SA... Feb 27 12:21:02 ... finished generating suffix array Feb 27 12:21:02 ... generating Suffix Array index Feb 27 12:21:11 ... completed Suffix Array index Feb 27 12:21:11 ... writing Genome to disk ... Feb 27 12:21:11 ... writing Suffix Array to disk ... Feb 27 12:21:11 ... writing SAindex to disk Feb 27 12:21:11 ..... finished successfully Indexing done. Mapping for UHR_Rep1_chr22.R1 : in progress... Mapping for UHR_Rep1_chr22.R1 : done. /sw/local/rocky8/noarch/qcif/software/miniconda3/envs/sqanti3_5.2/bin/STAR-avx2 --runThreadN 4 --genomeDir /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_index/ --readFilesIn /scratch/project/qaafi-cnafs/upendra/Sqanti3/Example/UHR_Rep1_chr22.R1.fastq /scratch/project/qaafi-cnafs/upendra/Sqanti3/Example/UHR_Rep1_chr22.R2.fastq --outFileNamePrefix /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping/UHR_Rep1_chr22.R1 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterType BySJout --outSAMunmapped Within --outFilterMultimapNmax 20 --outFilterMismatchNoverLmax 0.04 --outFilterMismatchNmax 999 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --sjdbScore 1 --genomeLoad NoSharedMemory --outSAMtype BAM SortedByCoordinate --twopassMode Basic STAR version: 2.7.11b compiled: 2024-01-29T15:15:38+0000 :/opt/conda/conda-bld/star_1706541070242/work/source Feb 27 12:21:12 ..... started STAR run Feb 27 12:21:12 ..... loading genome Feb 27 12:21:12 ..... started 1st pass mapping Feb 27 12:21:36 ..... finished 1st pass mapping Feb 27 12:21:36 ..... inserting junctions into the genome indices Feb 27 12:21:44 ..... started mapping Feb 27 12:22:09 ..... finished mapping Feb 27 12:22:09 ..... started sorting BAM Feb 27 12:22:09 ..... finished successfully Mapping for UHR_Rep2_chr22.R1 : in progress... Mapping for UHR_Rep2_chr22.R1 : done. /sw/local/rocky8/noarch/qcif/software/miniconda3/envs/sqanti3_5.2/bin/STAR-avx2 --runThreadN 4 --genomeDir /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_index/ --readFilesIn /scratch/project/qaafi-cnafs/upendra/Sqanti3/Example/UHR_Rep2_chr22.R1.fastq /scratch/project/qaafi-cnafs/upendra/Sqanti3/Example/UHR_Rep2_chr22.R2.fastq --outFileNamePrefix /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping/UHR_Rep2_chr22.R1 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterType BySJout --outSAMunmapped Within --outFilterMultimapNmax 20 --outFilterMismatchNoverLmax 0.04 --outFilterMismatchNmax 999 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --sjdbScore 1 --genomeLoad NoSharedMemory --outSAMtype BAM SortedByCoordinate --twopassMode Basic STAR version: 2.7.11b compiled: 2024-01-29T15:15:38+0000 :/opt/conda/conda-bld/star_1706541070242/work/source Feb 27 12:22:10 ..... started STAR run Feb 27 12:22:10 ..... loading genome Feb 27 12:22:10 ..... started 1st pass mapping Feb 27 12:22:28 ..... finished 1st pass mapping Feb 27 12:22:28 ..... inserting junctions into the genome indices Feb 27 12:22:36 ..... started mapping Feb 27 12:22:55 ..... finished mapping Feb 27 12:22:55 ..... started sorting BAM Feb 27 12:22:55 ..... finished successfully Input pattern: /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping/. The following files found and to be read as junctions: /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping/UHR_Rep2_chr22.R1SJ.out.tab /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping/UHR_Rep1_chr22.R1SJ.out.tab 6762 junctions read. 2 junctions added to both strands because no strand information from STAR. Running calculation of TSS ratio BAM files identified: ['/scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping//UHR_Rep1_chr22.R1Aligned.sortedByCoord.out.bam', '/scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping//UHR_Rep2_chr22.R1Aligned.sortedByCoord.out.bam'] Temp files removed.

Performing Classification of Isoforms.... Number of classified isoforms: 3925 RT-switching computation.... Full-length read abundance files not provided. Adding TSS ratio data... **** Running Kallisto to calculate isoform expressions. Running kallisto index /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/kallisto_output/kallisto_corrected_fasta.idx using as reference /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/UHR_chr22_corrected.fasta

**Running Kallisto quantification for UHR_Rep1_chr22.R1 sample

Error: kallisto index file not found /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/kallisto_output/kallisto_corrected_fasta.idx

Usage: kallisto quant [arguments] FASTQ-files

Required arguments: -i, --index=STRING Filename for the kallisto index to be used for quantification -o, --output-dir=STRING Directory to write output to

Optional arguments: -b, --bootstrap-samples=INT Number of bootstrap samples (default: 0) --seed=INT Seed for the bootstrap sampling (default: 42) --plaintext Output plaintext instead of HDF5 --single Quantify single-end reads --single-overhang Include reads where unobserved rest of fragment is predicted to lie outside a transcript --fr-stranded Strand specific reads, first read forward --rf-stranded Strand specific reads, first read reverse -l, --fragment-length=DOUBLE Estimated average fragment length -s, --sd=DOUBLE Estimated standard deviation of fragment length (default: -l, -s values are estimated from paired end data, but are required when using --single) -t, --threads=INT Number of threads to use (default: 1) --verbose Print out progress information every 1M proccessed reads Running Kallisto quantification for UHR_Rep2_chr22.R1 sample

Error: kallisto index file not found /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/kallisto_output/kallisto_corrected_fasta.idx

Usage: kallisto quant [arguments] FASTQ-files

Required arguments: -i, --index=STRING Filename for the kallisto index to be used for quantification -o, --output-dir=STRING Directory to write output to

Optional arguments: -b, --bootstrap-samples=INT Number of bootstrap samples (default: 0) --seed=INT Seed for the bootstrap sampling (default: 42) --plaintext Output plaintext instead of HDF5 --single Quantify single-end reads --single-overhang Include reads where unobserved rest of fragment is predicted to lie outside a transcript --fr-stranded Strand specific reads, first read forward --rf-stranded Strand specific reads, first read reverse -l, --fragment-length=DOUBLE Estimated average fragment length -s, --sd=DOUBLE Estimated standard deviation of fragment length (default: -l, -s values are estimated from paired end data, but are required when using --single) -t, --threads=INT Number of threads to use (default: 1) --verbose Print out progress information every 1M proccessed reads Traceback (most recent call last): File "/sw/local/rocky8/noarch/qcif/software/SQANTI3-5.2/sqanti3_qc.py", line 2542, in main() File "/sw/local/rocky8/noarch/qcif/software/SQANTI3-5.2/sqanti3_qc.py", line 2525, in main run(args) File "/sw/local/rocky8/noarch/qcif/software/SQANTI3-5.2/sqanti3_qc.py", line 1978, in run exp_dict = expression_parser(expression_files) File "/sw/local/rocky8/noarch/qcif/software/SQANTI3-5.2/sqanti3_qc.py", line 806, in expression_parser reader = DictReader(open(exp_file), delimiter='\t') FileNotFoundError: [Errno 2] No such file or directory: '/scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/kallisto_output/UHR_Rep1_chr22.R1/abundance.tsv'**

Kindly let me know what I can do to fix this issue.

Many thanks, Upendra.

carolinamonzo commented 8 months ago

Hi @Upendra19993, that's very unfortunate, it seems like your process ended because it's missing the kallisto index for quantification. This is the main problem that's making everything downstream fail: Error: kallisto index file not found /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/kallisto_output/kallisto_corrected_fasta.idx

Do you have Kallisto installed? Can you try indexing the Kallisto fasta file?

eprdz commented 8 months ago

I think there is a problem with latest version of kallisto... I had a similar problem with kallisto v0.50.1. I downgraded by doing: conda install "bioconda::kallisto<0.50.1" and I could make the kallisto index.

Upendra19993 commented 8 months ago

Hi Carolinamonzó and eprdz,

I had installed Kallisto but had this issue. Then I also tried with a different version (kallisto0.48.0) and it worked and got the results without any error. Thank you both of you!