Closed cariocow closed 4 days ago
Dear @cariocow
Something weird is happening during reading the reference genome with pyfaidx. Could you send me the first few lines of the reference genome?
Best Andrey
sorry i fixed it. re-dl the reference.
then i got another issue. here's the log. Command line: /home/cario/bin/miniconda3/envs/isoquant/bin/isoquant.py --reference /Temporary-data/cario/reference/hg38_111/Homo_sapiens.GRCh38.dna.toplevel.fa --genedb /Temporary-data/cario/reference/hg38_111/Homo_sapiens.GRCh38.111.gtf --fastq /Temporary-data/cario/BT019327_sup/scnanogps_111/fastq/bt019327_c10.fastq --data_type nanopore -o /Temporary-data/cario/isoformswitchanalyser/isoquant -t 10 2024-06-26 11:16:48,460 - INFO - Running IsoQuant version 3.4.1 2024-06-26 11:16:48,460 - WARNING - Output folder already contains a previous run, some files may be overwritten. Use --resume to resume a failed run. Use --force to avoid this message. 2024-06-26 11:16:48,460 - WARNING - Press Ctrl+C to interrupt the run now. 2024-06-26 11:16:57,462 - INFO - Overwriting the previous run 2024-06-26 11:16:58,464 - WARNING - /Temporary-data/cario/isoformswitchanalyser/isoquant/OUT folder already exists, some files may be overwritten 2024-06-26 11:16:58,464 - WARNING - /Temporary-data/cario/isoformswitchanalyser/isoquant/OUT/aux folder already exists, some files may be overwritten 2024-06-26 11:16:58,465 - INFO - Novel unspliced transcripts will not be reported, set --report_novel_unspliced true to discover them 2024-06-26 11:16:58,465 - INFO - === IsoQuant pipeline started === 2024-06-26 11:16:58,465 - INFO - gffutils version: 0.13 2024-06-26 11:16:58,465 - INFO - pysam version: 0.22.1 2024-06-26 11:16:58,465 - INFO - pyfaidx version: 0.8.1.1 2024-06-26 11:16:58,465 - INFO - Checking input gene annotation 2024-06-26 11:17:31,049 - INFO - Gene annotation seems to be correct 2024-06-26 11:17:31,248 - INFO - Converting gene annotation file to .db format (takes a while)... 2024-06-26 17:59:45,246 - INFO - Gene database written to /Temporary-data/cario/isoformswitchanalyser/isoquant/Homo_sapiens.GRCh38.111.db 2024-06-26 17:59:45,247 - INFO - Provide this database next time to avoid excessive conversion 2024-06-26 17:59:45,248 - INFO - Indexing reference 2024-06-26 17:59:45,249 - INFO - Converting gene annotation file /Temporary-data/cario/isoformswitchanalyser/isoquant/Homo_sapiens.GRCh38.111.db to .bed format 2024-06-26 18:01:14,013 - INFO - Gene database BED written to /Temporary-data/cario/isoformswitchanalyser/isoquant/Homo_sapiens.GRCh38.111.bed 2024-06-26 18:01:14,024 - INFO - Aligning /Temporary-data/cario/BT019327_sup/scnanogps_111/fastq/bt019327_c10.fastq to the reference, alignments will be saved to /Temporary-data/cario/isoformswitchanalyser/isoquant/OUT/aux/OUT_bt019327_c10_b3d719_467c1c_f98cde.bam 2024-06-26 18:01:14,027 - INFO - Running minimap2 version 2.28-r1209 (takes a while) 2024-06-26 18:01:37,859 - INFO - Sorting alignments 2024-06-26 18:01:40,610 - INFO - Indexing alignments 2024-06-26 18:01:41,794 - INFO - Loading gene database from /Temporary-data/cario/isoformswitchanalyser/isoquant/Homo_sapiens.GRCh38.111.db 2024-06-26 18:01:42,146 - INFO - Loading reference genome from /Temporary-data/cario/reference/hg38_111/Homo_sapiens.GRCh38.dna.toplevel.fa 2024-06-26 18:01:42,184 - INFO - Processing 1 experiment 2024-06-26 18:01:42,184 - INFO - Processing experiment OUT 2024-06-26 18:01:42,184 - INFO - Experiment has 1 BAM file: /Temporary-data/cario/isoformswitchanalyser/isoquant/OUT/aux/OUT_bt019327_c10_b3d719_467c1c_f98cde.bam 2024-06-26 18:01:42,184 - INFO - Collecting read alignments 2024-06-26 18:01:43,004 - INFO - Processing chromosome 6 2024-06-26 18:01:43,004 - INFO - Processing chromosome 3 2024-06-26 18:01:43,035 - INFO - Processing chromosome 5 2024-06-26 18:01:43,049 - INFO - Processing chromosome 1 2024-06-26 18:01:43,087 - INFO - Processing chromosome 2 2024-06-26 18:01:43,110 - INFO - Processing chromosome 8 2024-06-26 18:01:43,122 - INFO - Processing chromosome 7 2024-06-26 18:01:43,122 - INFO - Processing chromosome X 2024-06-26 18:01:43,143 - INFO - Processing chromosome 9 2024-06-26 18:01:43,153 - INFO - Processing chromosome 4 2024-06-26 18:01:43,774 - INFO - Processing chromosome 11 2024-06-26 18:01:43,774 - INFO - Processing chromosome 10 2024-06-26 18:01:43,812 - INFO - Processing chromosome 12 2024-06-26 18:01:43,856 - INFO - Processing chromosome 13 2024-06-26 18:01:43,881 - INFO - Processing chromosome 14 2024-06-26 18:01:43,907 - INFO - Processing chromosome 15 2024-06-26 18:01:43,911 - INFO - Processing chromosome 16 2024-06-26 18:01:43,970 - INFO - Processing chromosome 18 2024-06-26 18:01:43,975 - INFO - Processing chromosome 17 2024-06-26 18:01:44,098 - INFO - Processing chromosome 20 2024-06-26 18:01:44,490 - INFO - Processing chromosome 19 2024-06-26 18:01:44,502 - INFO - Processing chromosome Y 2024-06-26 18:01:44,532 - INFO - Processing chromosome 22 2024-06-26 18:01:44,568 - INFO - Processing chromosome 21 2024-06-26 18:01:44,606 - INFO - Processing chromosome HG76_PATCH 2024-06-26 18:01:44,981 - CRITICAL - IsoQuant failed with the following error, please, submit this issue to https://github.com/ablab/IsoQuant/issuesconcurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/home/cario/bin/miniconda3/envs/isoquant/lib/python3.12/concurrent/futures/process.py", line 263, in _process_worker r = call_item.fn(*call_item.args, *call_item.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cario/bin/miniconda3/envs/isoquant/lib/python3.12/concurrent/futures/process.py", line 212, in _process_chunk return [fn(args) for args in chunk] ^^^^^^^^^ File "/home/cario/bin/miniconda3/envs/isoquant/share/isoquant-3.4.1-0/src/dataset_processor.py", line 129, in collect_reads_in_parallel AlignmentCollector(chr_id, bam_file_pairs, args, illumina_bam, gffutils_db, current_chr_record, read_grouper) File "/home/cario/bin/miniconda3/envs/isoquant/share/isoquant-3.4.1-0/src/alignment_processor.py", line 240, in init self.bam_pairs[0][0].get_reference_length(self.chr_id), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pysam/libcalignmentfile.pyx", line 1919, in pysam.libcalignmentfile.AlignmentFile.get_reference_length File "pysam/libcalignmentfile.pyx", line 511, in pysam.libcalignmentfile.AlignmentHeader.get_reference_length KeyError: 'unknown reference 1' """
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/cario/bin/miniconda3/envs/isoquant/bin/isoquant.py", line 808, in
Sounds like your GTF file and you FASTA reference have distinct chromosome names (e.g. "chr1" and "1"). Could you check this?
Both of them seem using the same chromosome names as "1" for chrmosome 1.
the head of the gtf list as below:
1 havana gene 182696 184174 . + . gene_id "ENSG00000279928"; gene_version "2"; gene_name "DDX11L17"; gene_source "havana"; gene_biotype "unprocessed_pseudogene"; 1 havana transcript 182696 184174 . + . gene_id "ENSG00000279928"; gene_version "2"; transcript_id "ENST00000624431"; transcript_version "2"; gene_name "DDX11L17"; gene_source "havana"; gene_biotype "unprocessed_pseudogene"; transcript_name "DDX11L17-201"; transcript_source "havana"; transcript_biotype "unprocessed_pseudogene"; tag "basic"; tag "Ensembl_canonical"; transcript_support_level "NA";
And, the headers of the ref.fa are as:
list_fasta_headers("/Temporary-data/cario/reference/hg38_111/Homo_sapiens.GRCh38.dna.toplevel.fa") 1 dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF 2 dna:chromosome chromosome:GRCh38:2:1:242193529:1 REF 3 dna:chromosome chromosome:GRCh38:3:1:198295559:1 REF 4 dna:chromosome chromosome:GRCh38:4:1:190214555:1 REF 5 dna:chromosome chromosome:GRCh38:5:1:181538259:1 REF 6 dna:chromosome chromosome:GRCh38:6:1:170805979:1 REF 7 dna:chromosome chromosome:GRCh38:7:1:159345973:1 REF 8 dna:chromosome chromosome:GRCh38:8:1:145138636:1 REF 9 dna:chromosome chromosome:GRCh38:9:1:138394717:1 REF 10 dna:chromosome chromosome:GRCh38:10:1:133797422:1 REF 11 dna:chromosome chromosome:GRCh38:11:1:135086622:1 REF 12 dna:chromosome chromosome:GRCh38:12:1:133275309:1 REF 13 dna:chromosome chromosome:GRCh38:13:1:114364328:1 REF 14 dna:chromosome chromosome:GRCh38:14:1:107043718:1 REF 15 dna:chromosome chromosome:GRCh38:15:1:101991189:1 REF 16 dna:chromosome chromosome:GRCh38:16:1:90338345:1 REF 17 dna:chromosome chromosome:GRCh38:17:1:83257441:1 REF 18 dna:chromosome chromosome:GRCh38:18:1:80373285:1 REF 19 dna:chromosome chromosome:GRCh38:19:1:58617616:1 REF 20 dna:chromosome chromosome:GRCh38:20:1:64444167:1 REF 21 dna:chromosome chromosome:GRCh38:21:1:46709983:1 REF 22 dna:chromosome chromosome:GRCh38:22:1:50818468:1 REF X dna:chromosome chromosome:GRCh38:X:1:156040895:1 REF Y dna:chromosome chromosome:GRCh38:Y:1:57227415:1 REF MT dna:chromosome chromosome:GRCh38:MT:1:16569:1 REF ...
for your information, i get them from Ensembl. gtf: https://ftp.ensembl.org/pub/release-111/gtf/homo_sapiens/Homo_sapiens.GRCh38.111.gtf.gz ref: https://ftp.ensembl.org/pub/release-112/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
thanks for your help :)
It seems that the problem is in the BAM file. Could you send me a few lines and a header from the BAM file?
Please reopen if the issue is still there.
Hi, thanks for making IsoQuant, when i run my ont sequencing data, i got the following error. would you please kindly help to fix it. many thanks! :)
Command line: /home/cario/bin/miniconda3/envs/isoquant/bin/isoquant.py --reference /Temporary-data/cario/reference/hg38_111/Homo_sapiens.GRCh38.dna.toplevel.fa --genedb /Temporary-data/cario/reference/hg38_111/Homo_sapiens.GRCh38.111.gtf --fastq /Temporary-data/cario/BT019327_sup/scnanogps_111/fastq/bt019327_c10.fastq --data_type nanopore -o /Temporary-data/cario/isoformswitchanalyser/isoquant -t 10 2024-06-25 17:50:48,444 - INFO - Running IsoQuant version 3.4.1 2024-06-25 17:50:48,444 - WARNING - Output folder already contains a previous run, some files may be overwritten. Use --resume to resume a failed run. Use --force to avoid this message. 2024-06-25 17:50:48,444 - WARNING - Press Ctrl+C to interrupt the run now. 2024-06-25 17:50:57,446 - INFO - Overwriting the previous run 2024-06-25 17:50:58,479 - WARNING - /Temporary-data/cario/isoformswitchanalyser/isoquant/OUT folder already exists, some files may be overwritten 2024-06-25 17:50:58,480 - WARNING - /Temporary-data/cario/isoformswitchanalyser/isoquant/OUT/aux folder already exists, some files may be overwritten 2024-06-25 17:50:58,480 - INFO - Novel unspliced transcripts will not be reported, set --report_novel_unspliced true to discover them 2024-06-25 17:50:58,481 - INFO - === IsoQuant pipeline started === 2024-06-25 17:50:58,481 - INFO - gffutils version: 0.13 2024-06-25 17:50:58,481 - INFO - pysam version: 0.22.1 2024-06-25 17:50:58,481 - INFO - pyfaidx version: 0.8.1.1 2024-06-25 17:50:58,481 - INFO - Checking input gene annotation 2024-06-25 17:51:33,997 - INFO - Gene annotation seems to be correct 2024-06-25 17:51:34,187 - INFO - Converting gene annotation file to .db format (takes a while)... 2024-06-26 00:30:25,704 - INFO - Gene database written to /Temporary-data/cario/isoformswitchanalyser/isoquant/Homo_sapiens.GRCh38.111.db 2024-06-26 00:30:25,705 - INFO - Provide this database next time to avoid excessive conversion 2024-06-26 00:30:25,707 - INFO - Indexing reference 2024-06-26 00:30:28,972 - INFO - Converting gene annotation file /Temporary-data/cario/isoformswitchanalyser/isoquant/Homo_sapiens.GRCh38.111.db to .bed format 2024-06-26 00:31:51,928 - INFO - Gene database BED written to /Temporary-data/cario/isoformswitchanalyser/isoquant/Homo_sapiens.GRCh38.111.bed 2024-06-26 00:31:51,939 - INFO - Aligning /Temporary-data/cario/BT019327_sup/scnanogps_111/fastq/bt019327_c10.fastq to the reference, alignments will be saved to /Temporary-data/cario/isoformswitchanalyser/isoquant/OUT/aux/OUT_bt019327_c10_9142ca_ba9abe_2e640b.bam 2024-06-26 00:31:51,942 - INFO - Running minimap2 version 2.28-r1209 (takes a while) 2024-06-26 00:32:13,944 - INFO - Sorting alignments 2024-06-26 00:32:16,202 - INFO - Indexing alignments 2024-06-26 00:32:17,372 - INFO - Loading gene database from /Temporary-data/cario/isoformswitchanalyser/isoquant/Homo_sapiens.GRCh38.111.db 2024-06-26 00:32:17,696 - INFO - Loading reference genome from /Temporary-data/cario/reference/hg38_111/Homo_sapiens.GRCh38.dna.toplevel.fa 2024-06-26 00:32:17,703 - CRITICAL - IsoQuant failed with the following error, please, submit this issue to https://github.com/ablab/IsoQuant/issuesTraceback (most recent call last): File "/home/cario/bin/miniconda3/envs/isoquant/bin/isoquant.py", line 808, in
main(sys.argv[1:])
File "/home/cario/bin/miniconda3/envs/isoquant/bin/isoquant.py", line 802, in main
run_pipeline(args)
File "/home/cario/bin/miniconda3/envs/isoquant/bin/isoquant.py", line 754, in run_pipeline
dataset_processor = DatasetProcessor(args)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/cario/bin/miniconda3/envs/isoquant/share/isoquant-3.4.1-0/src/dataset_processor.py", line 403, in init
self.reference_record_dict = Fasta(self.args.reference, indexname=args.fai_file_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cario/bin/miniconda3/envs/isoquant/lib/python3.12/site-packages/pyfaidx/init.py", line 1090, in init
self.faidx = Faidx(
^^^^^^
File "/home/cario/bin/miniconda3/envs/isoquant/lib/python3.12/site-packages/pyfaidx/init.py", line 505, in init
self.build_index()
File "/home/cario/bin/miniconda3/envs/isoquant/lib/python3.12/site-packages/pyfaidx/init.py", line 606, in build_index
line = line.decode()
^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position 16: invalid continuation byte