Closed huzuner closed 2 weeks ago
I see that I mistakenly used the primer for the test. However I can't find the SARS-CoV-2-artic-v4_1.primer.bed
under resources. Is this already known or am I missing something?
The latest commit to the master branch now uses nCoV-2019.primer.bed
also in the main config. Then, this PR should be ready to go.
I would rather put all the rules that require the primer bed in an if that checks whether the primers are defined in the config file. If it is missing, one can assume normal shotgun sequencing, so that no primers are needed anyway.
I initiated something above and checked the workflow for a while now and I'm a bit lost at the point where upstream of get_samtools_sort_input
requires the primer file. Apparently, it is depending on stages "initial" and "hardclipped" and I don't have much of a clue how to add an if for an unclipped fasta there. Another file that is depending on the primer is results/{date}/corrected/{sample}/{sample}.correctedReads.clip.fasta
and it is used in the function get_reads_by_stage()
and there are additional stages as well.
I am wondering whether I understand correctly or if there is a simpler way of configuring the whole workflow to include a check for the primer.
Also, I find it a bit difficult to see how that will fix the file not found error when executing the workflow on the cluster. When using non-shotgun sequencing samples, if the primer file is not being found, it will assume shotgun sequencing, why would we want to allow that? or am I missing something here?
Right, not as easy as I thought. The info whether a sample is amplicon or not is denoted in the sample sheet column is_amplicon_data
.
I think it should be fine to just keep the original approach that takes the bed file from the config, and just fix this here:
rule bed_to_bedpe:
input:
check_bed_for_URL(config["preprocessing"]["amplicon-primers"]),
Basically, I would move this check into the script of this rule instead. Otherwise, the pipeline is also not platform independent because users might want to pull stuff e.g. from S3. I believe, then one does not even need to change anything else. Just, when running the pipeline as a module, one has to specify the bed file in the correct location and cannot rely on it to be delivered with the pipeline. The latter could be circumvented by allowing to specify a URL for the bed file instead of a file path. However, this is only needed if one really has amplicon data. For normal WGS, it should just run because the path denoted in the config file will never be checked. If it is still checked somewhere else, we can fix that.
Then, I hope my latest changes upon your suggestion fixes the issue.
I think this fix is needed: https://github.com/IKIM-Essen/uncovar/pull/668
I just disabled the formatting as the fork has problems checking out the branch.
Description
Before this change, I suspect that the primer file cannot be found by the file system storage in the cluster. This leads to an error during module import of Uncovar pipeline. This PR moves the file under workflow/resources and fixes the file path in the config.yaml.
Related Issue
Checklist
pre-commit
if needed.CODE_OF_CONDUCT.md
document.CONTRIBUTING.md
guide.