Open Xuyen21 opened 9 months ago
Hi, sorry for the slow answer, I only came back from vacation this week.
Following the guide for wastewater experimental branch.
By the way, the wastewater specifics have now been merged into the main master branch.
I got stuck in the last stage of vpipe deconvolution as you can see in this error log: deconvoluted.err.log.
Could you also provide the deconvoluted.out.log? This would provide details about the parameter users to deconvolute. (It recapitulates all what was loaded from the various .yaml files and/or autoguessed from the data).
will return an empty data frame.
Indeed that's the problematic part. For some reason it can't generate a deconvolution for the given input parameters. Most likely it's not considering the correct date range or the correct variants for the period.
I noticed that in the generated variants_pangolin.yaml file start_date and end_date is not added in the previous step.
Normally, the dates should be autoguessed from the range of the "date" column in results/tallymut.tsv.zst (It should be mentionned in the deconvoluted.out.log).
Could you also provide your V-pipe config file?
Among other:
Regarding file var_dates.yaml: You only need entries when the mixture of present variants (e.g.: as detected by COJAC) changes.
E.g.: if before 2022-07-1 you have a different mixture of variant, and it changes afterward, you just write:
var_dates:
'2022-05-01':
# at the beginning of the project, only B.1.1.7 'Alpha', P.1 'Gamma' are present
- B.1.1.7
- P.1
'2022-07-01':
# starting from huly, Delta B.1.617.2 and Omicron BA.1 showed up to the party
- B.1.1.7
- B.1.617.2
- P.1
- BA.1
This will cause Lollipop to do one deconvolution for all samples between May and July while looking only for quantification of B.1.1.7 and P.1, then a second deconvolution for all samples after July and this time also looking for B.1.617.2 and BA.1 in addition, then concatenating the curves chronologically.
The way you wrote you yaml, LolliPop will start one deconvolution each month (from 2022-06-12 to 2022-07-17, then 2022-07-17 to 2022-08-14, then 2022-08-14 to 2022-09-18, then everything after 2022-09-18) but you asked each time to estimate the proportion for the same mixture of variants (B.1.1.7, B.1.617.2, P.1, BA.1).
Hi @DrYak , Thank you for your reply.
I run this time with only 2 variants (Alpha and Delta) from the references/voc this time.
Here is the deconvoluted.out.log file deconvoluted.out.log
Could you also provide your V-pipe config file?
Here is my config.yaml file:
general:
virus_base_config: 'sars-cov-2'
primers_trimmer: samtools
# for Oxford nanopore
aligner: minimap
reprocessor: skip
input:
datadir: samples/
samples_file: samples.tsv
# for Oxford nanopore
paired: false
# generated with COJAC (or obtained from us)
variants_def_directory: references/voc/
protocols_file: references/primers.yaml
output:
datadir: results/
trim_primers: true
snv: false
local: false
global: false
visualization: false
diversity: false
QA: false
upload: false
dehumanized_raw_reads: false
# note no wastewater output flag for now, rules called explicitly
# for Oxford nanopore
minimap_align:
preset: 'map-ont'
# if dates and location are extracted from sample names:
timeline:
# timeline_tsv: timeline.tsv
regex_yaml: regex.yaml
locations_table: wastewater_plants.tsv
deconvolution:
threads: 8
# this file corresponds to the parameters used now on our curves:
# (provided by us)
deconvolution_config: deconv_bootstrap_cowwid.yaml
# file that specifies which variant are present at which time point, as determined by looking at COJAC's results
# done manually by user
variants_dates: var_dates.yaml
# automatically generated
variants_config: results/variants_pangolin.yaml
are you making your own variants-config YAML? Or are you letting V-pipe re-use the results/variants_pangolin.yaml automatically created during the previous step?
I let V-pipe reuse the results/variants_pangolin.yaml file.
Regarding file var_dates.yaml: You only need entries when the mixture of present variants (e.g.: as detected by COJAC) changes.
The variants Alpha (B.1.1.7 ) and Delta (B.1.617.2) always appear, so I adjust the var_dates.yaml like this:
var_dates:
'2022-06-12':
- B.1.1.7
- B.1.617.2
My regex.yaml file:
sample: \w{2}_(?P<location>\d+)_(?P<year>20\d{2})_(?P<month>[01]?\d)_(?P<day>[0-3]?\d)
My wastewater_plants.tsv file:
code location
05 Davos
10 Schanf
After all that, it returned the same error as before :)
Well I don't see anything anomalous... Something weird is happening.
Could you share me the compressed tallymut.tsv.zstd over, e.g. PolyBox, Switch Drives, etc. so I could have I try to see what's wrong?
Hi @DrYak Sorry for the late response.
Could you share me the compressed tallymut.tsv.zstd over, e.g. PolyBox, Switch Drives, etc. so I could have I try to see what's wrong?
Here is the file over google drives: tallymut.tsv.zst
Thank you!
Following the guide for wastewater experimental branch. I got stuck in the last stage of vpipe deconvolution as you can see in this error log: deconvoluted.err.log.
More specifically, the problem happens when running this code inside lollipop: (deconvolute.py)
will return an empty data frame. I noticed that in the generated variants_pangolin.yaml file start_date and end_date is not added in the previous step. Adding it manually does not solve the problem.
The content of the input files is as follows: results/tallymut.tsv.zst contains:
I just printed out 10 out of 2096 rows of the above data frame in python. deconv_bootstrap_cowwid.yaml:
results/variants_pangolin.yaml:
var_dates.yaml:
What can I change to make vpipe deconvolution work? Alternatively, what anaconda and jupiter notebook version do I need for the lollipop code to run?