ZiyueYang01 / VirID

VirID: An integrated platform for the discovery and characterization of RNA Viruses
MIT License
11 stars 5 forks source link

TypeError: expected str, bytes or os.PathLike object, not int #10

Open vinicius-santos-bmc opened 1 month ago

vinicius-santos-bmc commented 1 month ago

Hi! I had this issue:

time VirID assembly_and_basic_annotation -i SRR1007830
5_1.fastq -i2 SRR10078305_2.fastq -out_dir virid_output  --threads 24
[2024-10-01 11:41:45] INFO: VirID v3.1.0
[2024-10-01 11:41:45] INFO: VirID assembly_and_basic_annotation -i SRR10078305_1.fastq -i2 SRR10078305_2.
fastq -out_dir virid_output --threads 24
[2024-10-01 11:41:45] TASK: START Primary Screen PART...
[2024-10-01 11:41:45] INFO: [assembly_and_basic_annotation] Quality control of sequencing data
[2024-10-01 11:44:09] INFO: [assembly_and_basic_annotation] Remove rRNA
[2024-10-01 11:50:40] INFO: [assembly_and_basic_annotation] Use megahit to splice reads into contigs
[2024-10-01 11:52:28] INFO: [assembly_and_basic_annotation] Running diamond blastx to compare /home/vinisantos/anaconda3/envs/mamba/envs/virid/lib/python3.12/site-packages/VirID/data/diamond_database/RdRP_230330_rmdup
[2024-10-01 11:52:29] ERROR: Uncontrolled exit resulting from an unexpected error.

================================================================================
EXCEPTION: TypeError
  MESSAGE: expected str, bytes or os.PathLike object, not int
________________________________________________________________________________

Traceback (most recent call last):
  File "/home/vinisantos/anaconda3/envs/mamba/envs/virid/lib/python3.12/site-packages/VirID/__main__.py", line 55, in main
    gt_parser.parse_options(args)
  File "/home/vinisantos/anaconda3/envs/mamba/envs/virid/lib/python3.12/site-packages/VirID/main.py", line 155, in parse_options
    self.assembly_and_basic_annotation(options)
  File "/home/vinisantos/anaconda3/envs/mamba/envs/virid/lib/python3.12/site-packages/VirID/main.py", line 51, in assembly_and_basic_annotation
    accession_tax_VirusesFlitered_file = assembly_and_basic_annotation_item.run(options,rm_rRNA_file,filename)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vinisantos/anaconda3/envs/mamba/envs/virid/lib/python3.12/site-packages/VirID/assembly_and_basic_annotation.py", line 120, in run
   self._diamond_item(RdRP_DB_PATH,rdrp_out_type,megahit_out_fasta,rdrp_out_fasta,output_rdrp_tsv,model)
  File "/home/vinisantos/anaconda3/envs/mamba/envs/virid/lib/python3.12/site-packages/VirID/assembly_and_basic_annotation.py", line 72, in _diamond_item
    diamond_item.run(input_file,out_tsv,model)
  File "/home/vinisantos/anaconda3/envs/mamba/envs/virid/lib/python3.12/site-packages/VirID/external/blast.py", line 95, in run
    proc = subprocess.Popen(
           ^^^^^^^^^^^^^^^^^
  File "/home/vinisantos/anaconda3/envs/mamba/envs/virid/lib/python3.12/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/home/vinisantos/anaconda3/envs/mamba/envs/virid/lib/python3.12/subprocess.py", line 1885, in _execute_child
    self.pid = _fork_exec(
               ^^^^^^^^^^^
TypeError: expected str, bytes or os.PathLike object, not int
================================================================================

What could I be doing wrong?

ZiyueYang01 commented 1 month ago

Thanks, there is indeed one problem in the blast.py,we have fixed it in the new version (v2.0.2).

args = ['diamond','blastx', '-q', origin_file, '-d', self.database_path, '-o', 
                 output_tsv, '-e', '1E-4', '--query-gencode',str(self.translate_table), '-k', str(1), '-p', str(self.threads),'-f',str(6)]

In addition, we use fastp, which is able to process sequence data faster.

vinicius-santos-bmc commented 1 month ago

I'm still having a problem, but now after the contamination-removal process:

fastp -i SRR10078305_1.fastq -I SRR10078305_2.fastq -o virid_out_path/assembly_and_basic_annotation/step3_QC_1.fq -O virid_out_path/assembly_and_basic_annotation/step3_QC_2.fq -h virid_out_path/assembly_and_basic_annotation/step3_QC.html --detect_adapter_for_pe --dedup --dup_calc_accuracy 4 --dont_eval_duplication --low_complexity_filter --thread 24
fastp v0.23.4, time used: 60 seconds
[2024-10-09 22:01:16] INFO: [assembly_and_basic_annotation] Remove rRNA
[2024-10-09 22:06:40] INFO: [assembly_and_basic_annotation] Use megahit to splice reads into contigs
[2024-10-09 22:08:44] INFO: [assembly_and_basic_annotation] Running diamond blastx to compare /home/vinisantos/anaconda3/envs/mamba/envs/virid/lib/python3.12/site-packages/VirID/data/diamond_database/RdRP_230330_rmdup
[2024-10-09 22:08:46] INFO: [assembly_and_basic_annotation] Running diamond blastx to compare /data/databases/blastdb_08032023/nr/nr
[2024-10-10 00:45:57] INFO: [assembly_and_basic_annotation] Remove contigs that cannot be translated into longer amino acid contigs.
[2024-10-10 00:46:19] INFO: [assembly_and_basic_annotation] Contigs annotation
[2024-10-10 00:51:06] INFO: [assembly_and_basic_annotation] Cut the sequence contamination at both ends of contigs
[2024-10-10 01:10:30] TASK: END Primary Screen PART...
[2024-10-10 01:10:30] ERROR: Controlled exit resulting from early termination.
ZiyueYang01 commented 1 month ago

Do you have the NT database ? This step helps us to trim fragments from the host. If you don't need it, you must add --no_trim_contamination to skip this step.

vinicius-santos-bmc commented 1 month ago

Yes, I do, but I was using the nt_core database, which has virus sequences. I'll try to use --no_trim_contamination. Maybe in future versions, to avoid having to download the heavy versions of nt_euk and nt_prok for removing contamination, you could ask the user for the host ID NCBI genome fasta and then use them to map with bowtie2 and remove the host contamination. And there are also smaller databases for lab contamination removal. I think this would a be best option.

ZiyueYang01 commented 1 month ago

If you use the nt_core database which has virus sequences, VirID will delete all viral contigs. By the way, it's may not cause by the trim_contamination part, you can check whether the step10_blastn_trimed.fasta( result of trim_contamination part) in assembly_and_basic_annotation. You can also check the contents of the RPM_abundance folder, which will help us to find out the problem. In the meantime, thank you for your suggestions, we may improve this part in the next update.

franco-ye commented 1 month ago

The second stage does not include filtering out known viral sequences based on NT blastn results. In my analysis, it appears that the known viruses were present in the first stage but were removed in the second stage of building the evolutionary tree. This result contradicts the expected workflow described in the article.

vinicius-santos-bmc commented 1 month ago

If you use the nt_core database which has virus sequences, VirID will delete all viral contigs. By the way, it's may not cause by the trim_contamination part, you can check whether the step10_blastn_trimed.fasta( result of trim_contamination part) in assembly_and_basic_annotation. You can also check the contents of the RPM_abundance folder, which will help us to find out the problem. In the meantime, thank you for your suggestions, we may improve this part in the next update.

The file step10_blastn_trimed.fasta exists, but it's empty. And there is no RPM_abundance folder. So probably the error occurred before that.

I had also a problem when using the --no_trim_contamination argument:

[2024-10-12 12:21:38] INFO: [assembly_and_basic_annotation] Remove rRNA
[2024-10-12 12:27:16] INFO: [assembly_and_basic_annotation] Use megahit to splice reads into contigs
[2024-10-12 12:29:02] INFO: [assembly_and_basic_annotation] Running diamond blastx to compare /home/vinisantos/anaconda3/envs/mamba/envs/virid/lib/python3.12/site-packages/VirID/data/diamond_database/RdRP_230330_rmdup
[2024-10-12 12:29:04] INFO: [assembly_and_basic_annotation] Running diamond blastx to compare /data/databases/blastdb_08032023/nr/nr
[2024-10-12 15:06:31] INFO: [assembly_and_basic_annotation] Remove contigs that cannot be translated into longer amino acid contigs.
[2024-10-12 15:06:57] INFO: [assembly_and_basic_annotation] Contigs annotation
[2024-10-12 15:11:50] INFO: Summary results
[2024-10-12 15:11:51] ERROR: Uncontrolled exit resulting from an unexpected error.

================================================================================
EXCEPTION: KeyError
  MESSAGE: "['longest_aa_length'] not in index"
________________________________________________________________________________

Traceback (most recent call last):
  File "/home/vinisantos/anaconda3/envs/mamba/envs/virid/lib/python3.12/site-packages/VirID/__main__.py", line 55, in main
    gt_parser.parse_options(args)
  File "/home/vinisantos/anaconda3/envs/mamba/envs/virid/lib/python3.12/site-packages/VirID/main.py", line 141, in parse_options
    self.assembly_and_basic_annotation(options)
  File "/home/vinisantos/anaconda3/envs/mamba/envs/virid/lib/python3.12/site-packages/VirID/main.py", line 80, in assembly_and_basic_annotation
    Summary(options,assembly_and_basic_annotation_path,"Primary_screen_res.tsv").run()
  File "/home/vinisantos/anaconda3/envs/mamba/envs/virid/lib/python3.12/site-packages/VirID/rvm/summary.py", line 82, in run
    final_output = nvi[['qseqid','NR_qlen','longest_aa_length','NR_sseqid','protein','NR_Virus','super_group','kindom','phylum','class','order','family','genus','species','virus_type']]
                   ~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vinisantos/anaconda3/envs/mamba/envs/virid/lib/python3.12/site-packages/pandas/core/frame.py", line 4108, in __getitem__
    indexer = self.columns._get_indexer_strict(key, "columns")[1]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vinisantos/anaconda3/envs/mamba/envs/virid/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 6200, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "/home/vinisantos/anaconda3/envs/mamba/envs/virid/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 6252, in _raise_if_missing
    raise KeyError(f"{not_found} not in index")
KeyError: "['longest_aa_length'] not in index" 
================================================================================
vinicius-santos-bmc commented 1 month ago

Will this be fixed soon or is there no way to use the tool?

ZiyueYang01 commented 1 month ago

We've fixed the bugs involved in --no_trim_contamination, you can check out the logs on github.