BU-ISCIII / relecov-tools

set of helper tools for the assembly of the different elements in the RELECOV platform (Spanish Network for genomic surveillance of SARS-Cov-2) as data download, processing, validation and upload to public databases, as well as analysis runs and database storage.
GNU General Public License v3.0
5 stars 21 forks source link

Read-lab-metadata search for files recursively when no samples_data is given #318

Open Shettland opened 1 month ago

Shettland commented 1 month ago

Recent changes made samples_data.json file not required for read-lab-metadata module (this file is generated by "download" module). This means that every sample will pass the file integrity filter and "fastq_filepath" fields will be saved as the folder where the provided metadata file is located.

It would be nice if the module would make use of os.walk() to recursively search for files named ["sequence_file_RX_fastq"] and use this information to create a samples_data.json which includes the paths and md5s for each sample's files, removing those samples with any issue from the final json (like corrupted files, non-existing files or md5 mismatch if given)

Shettland commented 2 weeks ago

Instead of this, a new argument to a folder where files are located will be included