Closed shengxingou closed 6 months ago
Yes, this function was achieved by using a convolutional neural network to distinguish between viruses and non-viral components (bacteria and fungi), using the sequence pattern matrix and a neural network framework within IPEV. Searches were performed on the NCBI Assembly database using the commands “(Fungi[orgn]) AND (representative genome[filter])” and “(Bacteria[orgn]) AND (representative genome[filter])” to download data sets. A total of 1,428 bacterial genome sequences were downloaded, with plasmids excluded and only those labeled as “complete” included, and 99 fungal sequences, excluding macrofungi. Using MetaSim, contigs were simulated to create negative sample groups for both bacterial and fungal data sets separately: 1 million for Group A (100-400 bp), 900,000 for Group B (400-800 bp), 800,000 for Group C (800-1,200 bp), and 700,000 for Group D (1,200-1,800 bp). Based on the sequence pattern matrix and neural network framework outlined in our manuscript, the model was trained, validated, and tested on the constructed Groups A to D using an 8:1:1 split for training, validation, and testing, respectively.
In this experiment, test sets were constructed with an equal number of virus-like and non-virus-like organisms (bacteria and fungi) in Group A - Group D. Sensitivity (Sn) and specificity (Sp) for each group were then calculated. For viral identification, Sn rates were 73.1% for Group A, 83.3% for Group B, 90.5% for Group C, and 93.1% for Group D. Correspondingly, for non-viral identification, Sp rates were 84.5% for Group A, 93.7% for Group B, 95.7% for Group C, and 97.3% for Group D. This enhanced functionality, aimed at reducing false-positive (non-viral) samples, has been integrated into the IPEV tool, complete with a user-activated switch.
thank you for your reply
I am very grateful to you for developing this excellent software. I want to know how the function of eliminating false positive non viral components (bacteria and fungi) in the software is implemented. Is this feature implemented through Convolutional Neural Networks (CNNs)? If so, how effective is this feature?
Thank you for your valuable time and energy!