Closed taylorreiter closed 1 year ago
done in #8
A related comment I want to document - the seqqc pipeline should be for QC'ing raw sequence data for rapid upload to public repositories. I think we will want to think about how we check genomes we assemble for QC beyond just single copy marker counts before uploading to Genbank etc. I don't know if we want that as a parallel/separate workflow in seqqc or as a penultimate step in the relevant separate workflow, such as the Pacbio assembly workflow and the metagenomic binning workflow. It could probably be an nf-core style module/subworkflow if we write it correctly and then can port easily between workflows.
right this is only for fastq data -- I would rather keep it small in scope and have it be only for fastq data -- I think that will be easier to maintain in the long run. I could rename it from seqqc to fastqqc -- but I think that's too close to fastqc the tool. Or I could do rawseqqc or something. Although at this point it would be marginally annoying to change the name, but I def could do it.
seqqc is fine, just wanted to get this thought out about genome QC somewhere before i forgot
FYI I've seen a big push happening on this pipeline recently: https://github.com/nf-core/genomeassembler
Description of feature
https://github.com/Arcadia-Science/seqqc/blob/main/workflows/seqqc.nf#L14
I guess some people convert their reads to fasta format, doesn't really make sense for this workflow as this should be data in rawest possible format...