A streamlined workflow and GUI for real-time species identification and pathogen characterization via nanopore sequencing data. Engineered for precision, speed, and user-friendliness, with offline functionality post-initialization.
GNU General Public License v3.0
14
stars
2
forks
source link
Enhancements to `file_utils.py` and `transform_utils.py` to implement mode gtdb-file #51
This PR introduces several key enhancements to the file_utils.py and transform_utils.py files to better handle GTDB metadata. The aim is to streamline the GTDB file downloading, reading, and processing steps, particularly when used in conjunction with other modes in nanometa_prepare.py.
Changes:
Download GTDB Metadata: Implemented a function download_gtdb_metadata() to download GTDB metadata files and corrected the directory paths.
Read and Process GTDB Metadata: Added a function read_and_process_gtdb_metadata() to read the GTDB metadata file, filter it based on Kraken2 taxonomy, and return a DataFrame. The function also includes additional logging to show the number of rows before and after filtering.
Adding Tax IDs: Introduced add_taxid_to_results() in transform_utils.py to map species names to tax IDs and add this information as a new column in the DataFrame.
Main Script Update: Included calls to these new functions in nanometa_prepare.py to integrate them into the existing workflow.
Summary:
This PR introduces several key enhancements to the
file_utils.py
andtransform_utils.py
files to better handle GTDB metadata. The aim is to streamline the GTDB file downloading, reading, and processing steps, particularly when used in conjunction with other modes innanometa_prepare.py
.Changes:
Download GTDB Metadata: Implemented a function
download_gtdb_metadata()
to download GTDB metadata files and corrected the directory paths.Read and Process GTDB Metadata: Added a function
read_and_process_gtdb_metadata()
to read the GTDB metadata file, filter it based on Kraken2 taxonomy, and return a DataFrame. The function also includes additional logging to show the number of rows before and after filtering.Adding Tax IDs: Introduced
add_taxid_to_results()
intransform_utils.py
to map species names to tax IDs and add this information as a new column in the DataFrame.Main Script Update: Included calls to these new functions in
nanometa_prepare.py
to integrate them into the existing workflow.