NBISweden / aMeta

Ancient microbiome snakemake workflow
MIT License
19 stars 15 forks source link

sample names and fastqc #52

Closed percyfal closed 1 month ago

percyfal commented 2 years ago

fastqc uses everything up to the fastq suffix as sample name. Unless the prefix matches the sample name, the fastqc rules will fail with a missing output file exception. Unfortunately, this behaviour cannot be changed according to a fastqc thread I read somewhere (can't find the link now). One solution could be to rename the directory, but I think one also would have to modify the contents of the fastqc output (not sure about this).

One sample can also be linked to multiple fastq files, which means 1) there is a need for a runinfo file (linking a sample to multiple fastq files) and 2) a treatment of qc output at the run level, merging bam files post-alignment after which sample-level processing continues.

LeandroRitter commented 1 month ago

@percyfal we added this explanation to README some time ago:

Currently, it is important that the sample names in the first column exactly match the names of the fastq-files in the second column. For example, a fastq-file "data/foo.fq.gz" specified in the "fastq" column, must have a name "foo" in the "sample" column. Please make sure that the names in the first and second columns match.

Since then I believe nobody reported problems. It is not optimal I agree but do you have time to fix it? Perhaps not worth spending time and we can close it?

percyfal commented 1 month ago

We can close this for now, I agree. Feel free to reopen if necessary.