Open gibberwocky opened 3 hours ago
run_find()
also requires the argument o
which appears to be an output file for writing the YAML:
# post processing
if o:
with open(o, "w") as f:
yaml.dump(found, f, sort_keys=False)
else:
print(yaml.dump(found, sort_keys=False))
The first argument to run_find()
needs to be self.seqspec_fn
rather than self.seqspec
, as run_find()
executes spec = load_spec(spec_fn)
which is already executed in class UniformData
prior to calling run_find()
. There's a bit of redundancy there, but it's the path of least resistance.
Implementing the above changes results in the following error:
Traceback (most recent call last):
File "/uoa/scratch/users/s14dw4/.conda/envs/cellatlas_fork/bin/cellatlas", line 33, in <module>
sys.exit(load_entry_point('cellatlas', 'console_scripts', 'cellatlas')())
File "/uoa/home/s14dw4/repos/cellatlas/cellatlas/main.py", line 45, in main
COMMAND_TO_FUNCTION[sys.argv[1]](parser, args)
File "/uoa/home/s14dw4/repos/cellatlas/cellatlas/cellatlas_build.py", line 106, in validate_build_args
outputs[0],
File "/uoa/home/s14dw4/repos/cellatlas/cellatlas/UniformData.py", line 49, in __init__
relevant_fqs = [rgn.parent_id for rgn in rgns]
TypeError: 'NoneType' object is not iterable
Which indicates that the result of:
rgns = run_find(self.seqspec_fn,
self.modality,
MOD2FEATURE.get(self.modality.upper(), ""),
idtype="region-type",
o="")
is an empty list.
run_find()
has four possible idtype
values region-type
, region
, read
, file
. Only region
prints yaml.dump
(ie is !None).
- !Region
region_id: cDNA
region_type: cdna
name: cDNA
sequence_type: random
sequence: X
min_len: 1
max_len: 150
onlist: null
regions: null
parent_id: R1.fastq.gz
However, the resulting rgns
list is empty.
Changing from run_find()
to directly calling find_by_region_id()
overcomes this issue:
rgns = find_by_region_id(
self.seqspec, self.modality, MOD2FEATURE.get(self.modality.upper(), "")
)
Leading to next error:
TypeError: run_index() missing 4 required positional arguments: 'idtype', 'rev', 'subregion_type', and 'o'
Which relates to changes to run_index()
in seqspec_index.py
which now requires more parameters:
def run_index(
spec_fn,
modality,
ids,
idtype,
fmt,
rev,
subregion_type,
o,
):
Than previously passed:
self.x_string = run_index(self.seqspec, self.modality, rids_in_spec, fmt="kb")
The latest version of seqspec does not have a
run_find_by_type()
function in seqspec_find.py. It does have arun_find()
function, which accepts the same arguments +idtype
, and calls the relevantfind_by_xxx
function based onidtype
. Comparing the twoseqspec_find.py
files, Claude suggests thatrun_find_by_type()
has been replaced withfind_by_region_type()
andrun_find()
is nowfind_by_region_id()
. This implies that we should either callfind_by_region_type()
directly, or callrun_find(..., idtype="region-type")
.