bcgsc / NanoSim

Nanopore sequence read simulator
Other
233 stars 56 forks source link

Error: 'NoneType' object has no attribute 'encode' #109

Closed TrentPrall closed 3 years ago

TrentPrall commented 3 years ago

Hello,

I keep running into the following error while trying to run read_analysis.py. I am using a dockerized version of nanosim

Here's the command:

docker run --user $(id -u):$(id -g) -v $(pwd):/scratch -w /scratch quay.io/biocontainers/nanosim:2.6.0--0 \
> read_analysis.py \
> genome \
> -i reads.fasta \
> -r GCA_011100615.1_Macaca_fascicularis_6.0_genomic.fna

And the output:

[M::mm_idx_gen::88.972*1.00] collected minimizers
[M::mm_idx_gen::129.677*0.99] sorted minimizers
[M::main::129.677*0.99] loaded/built the index for 936 target sequence(s)
[M::mm_mapopt_update::132.148*0.99] mid_occ = 583
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 936
[M::mm_idx_stat::132.999*0.99] distinct minimizers: 100954079 (39.03% are singletons); average occurrences: 5.351; average spacing: 5.380
[M::worker_pipeline::1071.743*1.00] mapped 211242 sequences
[M::worker_pipeline::1077.395*1.00] mapped 1967 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 --cs --MD -ax map-ont -t 1 GCA_011100615.1_Macaca_fascicularis_6.0_genomic.fna training_processed.fasta
[M::main] Real time: 1077.486 sec; CPU: 1075.534 sec; Peak RSS: 10.935 GB

Running the code with following parameters:

infile reads.fasta
ref_g GCA_011100615.1_Macaca_fascicularis_6.0_genomic.fna
aligner minimap2
g_alnm 
prefix training
num_threads 1
model_fit True
2021-03-13 16:55:01: Read pre-process
2021-03-13 16:55:04: Alignment with minimap2
2021-03-13 17:13:02: Processing alignment file: sam
Traceback (most recent call last):
  File "/usr/local/bin/read_analysis.py", line 552, in <module>
    main()
  File "/usr/local/bin/read_analysis.py", line 416, in main
    alnm_ext, unaligned_length, strandness = align_genome(in_fasta, prefix, aligner, num_threads, g_alnm, ref_g)
  File "/usr/local/bin/read_analysis.py", line 160, in align_genome
    unaligned_length, strandness = get_primary_sam.primary_and_unaligned(g_alnm, prefix)
  File "/usr/local/bin/get_primary_sam.py", line 15, in primary_and_unaligned
    for alnm in alignments:
  File "/usr/local/lib/python3.7/site-packages/HTSeq/__init__.py", line 854, in __iter__
    yield SAM_Alignment.from_pysam_AlignedSegment(pa, self.sf)
  File "python3/src/HTSeq/_HTSeq.pyx", line 1379, in HTSeq._HTSeq.SAM_Alignment.from_pysam_AlignedSegment
AttributeError: 'NoneType' object has no attribute 'encode'

Any thoughts on how to proceed? Thank you.

cheny19 commented 3 years ago

Could you try to use the latest version on Github? It doesn't require any installation, you can just clone the repo. We have NanoSim v3 already.

zjzace commented 3 years ago

Hi, Cheny19,

I also got this error message. I have cloned the latest repo. I am running it on Ubuntu 18.04. Below is my command:

~/software/NanoSim/src/read_analysis.py transcriptome -annot Homo_sapiens.GRCh38.103.gtf -i Cyto.fq -rg GRCh38.primary_assembly.genome.fa -rt Homo_sapiens.GRCh38.103.fa -o training -t 36

Could please help me out?

zjzace commented 3 years ago

Hi,

I think this may be related to the pysam version. I replaced it with 0.9.1 as suggested in the ReadMe file and it works well.

cheny19 commented 3 years ago

Cool! Which version of pysam did you use when it didn't work?

865699871 commented 3 years ago

I have the same question and the version of HTSeq is 0.13.5.

kmnip commented 3 years ago

@865699871 I verify that this error does indeed come from HTSeq's SAM/BAM readers.

The latest code in the master branch contains a bug fix for this issue, which uses pysam (instead of HTSeq) for reading SAM format files.

kmnip commented 3 years ago

This issue is fixed. Please let us know if you have any more problems.