I don't know how common this would be, but it doesn't seem like you can actually set barcode_runs to null in the config if you only want to build variants. There are two (really one) reasons for this:
You can't access the .dt property on an empty pd.DataFrame
AttributeError in file /fh/fast/bloom_j/computational_notebooks/whannon/2023/dms-vep-pipeline-3/Snakefile, line 50:
Can only use .dt accessor with datetimelike values
You could fix this by wrapping the following code in some condition that checks if the barcode_runs are provided:
if len(barcode_runs) > 0: # <---
# make sure barcode run samples start with <library>-<YYMMDD>-
sample_prefix = barcode_runs.assign(
prefix=lambda x: (
x["library"].astype(str) + "-" + x["date"].dt.strftime("%y%m%d") + "-"
),
has_prefix=lambda x: x.apply(
lambda r: r["sample"].startswith(r["prefix"]),
axis=1,
),
).query("not has_prefix")
if len(sample_prefix):
raise ValueError(f"Some barcode run samples lack correct prefix:\n{sample_prefix}")
# dicts mapping sample to library or date as string
sample_to_library = barcode_runs.set_index("sample")["library"].to_dict()
sample_to_date = (
barcode_runs.assign(date_str=lambda x: x["date"].dt.strftime("%Y-%m-%d"))
.set_index("sample")["date_str"]
.to_dict()
)
This is probably less relevant, but you'll run into an error if you forget to exclude any extra analyses that require the barcode runs downstream.
func_effects_config: data/func_effects_config.yml # Functional effects of mutations
antibody_escape_config: data/antibody_escape_config.yml # escape assays (eg, antibodies)
summaries_config: data/summaries_config.yml # Summaries across assays
Maybe it's worth also wrapping these in some kind of conditional based on the existence of barcode runs?
if len(barcode_runs) > 0: # <---
# include additional rule sets if they have configs defined
for rule_set in ["func_effects", "antibody_escape", "summaries"]:
rule_set_config = f"{rule_set}_config"
if (rule_set_config in config) and (config[rule_set_config] is not None):
include: f"{rule_set}.smk"
Or, maybe it's better to leave this up to the user just in case you'd have analyses defined here that don't need the barcode runs?
I don't know how common this would be, but it doesn't seem like you can actually set
barcode_runs
tonull
in the config if you only want to build variants. There are two (really one) reasons for this:.dt
property on an emptypd.DataFrame
You could fix this by wrapping the following code in some condition that checks if the barcode_runs are provided:
Maybe it's worth also wrapping these in some kind of conditional based on the existence of barcode runs?
Or, maybe it's better to leave this up to the user just in case you'd have analyses defined here that don't need the barcode runs?