Add checks for MM and ML tag consistency

PacificBiosciences / pb-CpG-tools

Collection of tools for the analysis of CpG data

BSD 3-Clause Clear License

70 stars 6 forks source link

Exception thrown in worker process 1913520: Exception thrown while processing read m64107_201211_205017/160762333/ccs: Base modification offsets in MM tag for modification type 'C+m' are inconsistent with read length

Yes. First of all sorry there was a file-wide autoformat that ran in the IDE and didn't pull that back out before the PR, so that's obscuring things a bit.

get_modline() will return None (new behavior)

The old logic was compressed onto one line next(x[len(modcode)+1:] for x in mmtag.split(';') if x.startswith(modcode)), and was designed to throw an exception to signal that the case of an empty MM tag (meaning just "MM:Z:"). This prevented us from using exceptions to convey any other type of error, so that case is now made explicit with the None value/empty list return.

parse_mmtag() will then return an empty list (vs. list of base indices) (new behavior)

If the MM tag is empty, it was previously returning an empty generator via the try/except, now it will return an empty list. Most of the generators used for MM/ML parsing were changed to lists here to improve error reporting. This makes everything run a little slower but allows us to more easily check for and correctly describe more errors (like a mismatch in the number of MM and ML values).

parse_mltag() will return an empty list (old behavior)

Yes, previously an empty generator but same idea.

get_mod_dict() will return an empty dictionary (this could happen previously too?)

Yes.

PacificBiosciences / pb-CpG-tools

Add checks for MM and ML tag consistency #23