dyxstat / ViralCC

ViralCC: leveraging metagenomic proximity-ligation to retrieve complete viral genomes
GNU Affero General Public License v3.0
15 stars 3 forks source link

ValueError: max() arg is an empty sequence #3

Closed haodun-li closed 10 months ago

haodun-li commented 11 months ago

Hi there,

I'm getting error when running ViralCC :python3 /media/16T/lhd/viralcc/ViralCC/viralcc.py pipeline -v fuji-meta-vmeta.fa MAP_SORTED.bam virus.txt ./output here is the error: Traceback (most recent call last): File "/media/16T/lhd/viralcc/ViralCC/viralcc.py", line 133, in cl = ClusterBin(args.OUTDIR, File "/media/16T/lhd/viralcc/ViralCC/bin.py", line 62, in init optimal = SIL_score.index(max(SIL_score)) ValueError: max() arg is an empty sequence

Thank you for the help!

dyxstat commented 10 months ago

Hi,

Thanks for trying our software and really sorry for the late reply. Could you provide me with your log file of ViralCC's execution?

Best

haodun-li commented 10 months ago

Thank you for your reply! This is what's in the log file: DEBUG | 2023-11-02 18:13:41,125 | main | ViralCC v1.0.0, released at 03/2022 DEBUG | 2023-11-02 18:13:41,126 | main | 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] DEBUG | 2023-11-02 18:13:41,126 | main | Command line: /media/16T/lhd/viralcc/ViralCC/viralcc.py pipeline -v fuji-meta-vmeta.fa MAP_SORTED.bam virus.txt ./output INFO | 2023-11-02 18:13:41,126 | construct_graph | Reading fasta file... DEBUG | 2023-11-02 18:13:41,608 | construct_graph | There are totally 149075 contigs in reference fasta INFO | 2023-11-02 18:13:41,886 | construct_graph | Filtering contigs by minimal length(1000)... DEBUG | 2023-11-02 18:13:41,909 | construct_graph | 0 contigs miss and 141121 contigs are too short DEBUG | 2023-11-02 18:13:41,909 | construct_graph | Accepted 7954 contigs covering 21030666 bp INFO | 2023-11-02 18:13:41,909 | construct_graph | Counting reads in bam file... DEBUG | 2023-11-02 18:25:58,101 | construct_graph | BAM file contains 619630073 alignments INFO | 2023-11-02 18:25:58,101 | construct_graph | Handling the alignments... DEBUG | 2023-11-02 18:43:20,388 | construct_graph | Pair accounting: OrderedDict([('accepted pairs', 1147567), ('map_same_contig pairs', 279024340), ('ref_excluded pairs', 26430132), ('poor_match pairs', 898258), ('single read', 4629479)]) INFO | 2023-11-02 18:43:20,420 | construct_graph | There are 0 viral contigs INFO | 2023-11-02 18:43:20,420 | construct_graph | There are 7954 potential host contigs INFO | 2023-11-02 18:43:20,420 | construct_graph | Write information of viral contigs and potential host contigs INFO | 2023-11-02 18:43:20,426 | construct_graph | the threshold of shared host contig is 4 INFO | 2023-11-02 18:43:20,426 | construct_graph | there are 0 edges in the host proximity graph INFO | 2023-11-02 18:43:20,426 | construct_graph | there are 0.0 edges in the Hi-C interaction graph INFO | 2023-11-02 18:43:20,426 | construct_graph | Integrate the Hi-C interaction graph and the host proximity graph INFO | 2023-11-02 18:43:20,426 | construct_graph | Integrative graph construction finished and there are 0.0 edges in the integrative graph viralcc.log (END)

m17852419953

@. | ---- Replied Message ---- | From | Yuxuan Yancey @.> | | Date | 12/5/2023 15:03 | | To | @.> | | Cc | @.> , @.***> | | Subject | Re: [dyxstat/ViralCC] ValueError: max() arg is an empty sequence (Issue #3) |

Hi,

Thanks for trying our software and really sorry for the late reply. Could you provide me with your log file of ViralCC's execution?

Best

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

dyxstat commented 10 months ago

Hi,

The problem arises since there are no viral contigs accepted from your assembled contigs and all your viral contigs input from the 'viral.txt' file are all filtered out.

'DEBUG | 2023-11-02 18:13:41,909 | construct_graph | 0 contigs miss and 141121 contigs are too short

INFO | 2023-11-02 18:43:20,420 | construct_graph | There are 0 viral contigs'

Please check your assembled contigs since 141,121 out of 149,075 assembled contigs are filtered out since they are shorter than 1000 bp, which is very weird.

Best

haodun-li commented 10 months ago

Hi, Thanks for your answer. Maybe my "virus.txt" file is not correct. I updated the file and tried to rerun viralcc. Best

m17852419953

@. | ---- Replied Message ---- | From | Yuxuan Yancey @.> | | Date | 12/6/2023 05:40 | | To | @.> | | Cc | @.> , @.***> | | Subject | Re: [dyxstat/ViralCC] ValueError: max() arg is an empty sequence (Issue #3) |

Hi,

The problem arises since there are no viral contigs accepted from your assembled contigs and all your viral contigs input from the 'viral.txt' file are all filtered out.

'DEBUG | 2023-11-02 18:13:41,909 | construct_graph | 0 contigs miss and 141121 contigs are too short

INFO | 2023-11-02 18:43:20,420 | construct_graph | There are 0 viral contigs'

Please check your assembled contigs since 141,121 out of 149,075 assembled contigs are filtered out since they are shorter than 1000 bp, which is very weird.

Best

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

haodun-li commented 10 months ago

Hi, Thank you very much for your help. After updating the“virus.txt”, the above problem was successfully resolved. After the viral run ends, we get some result files. As described in your article, “detects the virus-host pairs based on recovered viral genomes and Hi-C linkages”. However, I did not find very obvious information describing the virus-host correspondence in these result files. Therefore, I would like to ask you how to identify the correspondence between virus hosts from the output file. This is a part of the log file:

INFO | 2023-12-07 18:14:24,686 | construct_graph | There are 99 viral contigs INFO | 2023-12-07 18:14:24,686 | construct_graph | There are 7855 potential host contigs INFO | 2023-12-07 18:14:24,686 | construct_graph | Write information of viral contigs and potential host contigs INFO | 2023-12-07 18:14:24,696 | construct_graph | the threshold of shared host contig is 4 INFO | 2023-12-07 18:14:24,696 | construct_graph | there are 2 edges in the host proximity graph INFO | 2023-12-07 18:14:24,696 | construct_graph | there are 29.0 edges in the Hi-C interaction graph INFO | 2023-12-07 18:14:24,697 | construct_graph | Integrate the Hi-C interaction graph and the host proximity graph INFO | 2023-12-07 18:14:24,697 | construct_graph | Integrative graph construction finished and there are 29.0 edges in the integrative graph INFO | 2023-12-07 18:14:24,708 | bin | the number of generated viral bins is 96 INFO | 2023-12-07 18:14:24,708 | main | Clustering fininshed INFO | 2023-12-07 18:14:24,708 | main | Writing bins...

Your answer is very important to me. Thank you again for your help. Best!

dyxstat commented 10 months ago

Hi,

The ViralCC software serves as a binning tool for viral contigs. To replicate the results regarding virus-host interactions presented in our paper, you'll then need to bin non-viral contigs using your preferred binning tools and generate the raw Hi-C contact matrix for all contigs including viral and non-viral contigs (tools like NormCC can facilitate this process).

Utilizing both the raw Hi-C contact matrix and the identified viral as well as non-viral bins, you can then discern the viral-host pairs.

Best