Arcadia-Science / seqqc

A Nextflow pipeline to identify quality control issues with new sequencing data.
MIT License
28 stars 0 forks source link

Remove `params.fasta` since this should work on fastq data #2

Closed taylorreiter closed 1 year ago

taylorreiter commented 1 year ago

Description of feature

https://github.com/Arcadia-Science/seqqc/blob/main/workflows/seqqc.nf#L14

I guess some people convert their reads to fasta format, doesn't really make sense for this workflow as this should be data in rawest possible format...

taylorreiter commented 1 year ago

done in #8

elizabethmcd commented 1 year ago

A related comment I want to document - the seqqc pipeline should be for QC'ing raw sequence data for rapid upload to public repositories. I think we will want to think about how we check genomes we assemble for QC beyond just single copy marker counts before uploading to Genbank etc. I don't know if we want that as a parallel/separate workflow in seqqc or as a penultimate step in the relevant separate workflow, such as the Pacbio assembly workflow and the metagenomic binning workflow. It could probably be an nf-core style module/subworkflow if we write it correctly and then can port easily between workflows.

taylorreiter commented 1 year ago

right this is only for fastq data -- I would rather keep it small in scope and have it be only for fastq data -- I think that will be easier to maintain in the long run. I could rename it from seqqc to fastqqc -- but I think that's too close to fastqc the tool. Or I could do rawseqqc or something. Although at this point it would be marginally annoying to change the name, but I def could do it.

elizabethmcd commented 1 year ago

seqqc is fine, just wanted to get this thought out about genome QC somewhere before i forgot

taylorreiter commented 1 year ago

FYI I've seen a big push happening on this pipeline recently: https://github.com/nf-core/genomeassembler