LooseLab / readfish

CLI tool for flexible and fast adaptive sampling on ONT sequencers
https://looselab.github.io/readfish/
GNU General Public License v3.0
167 stars 31 forks source link

add dynamic reloading of mapping index #228

Closed W-L closed 1 year ago

W-L commented 1 year ago

Hi all, As just discussed I was working on dynamically reloading mm2 mapping indices, which we need to map to the constantly updated assemblies. I added this basically in the same way as the reloading of target masks is done: whenever we generate a new assembly (or new masks), we create an empty marker-file called contigs.updated (as previously masks.updated). readfish checks if this marker-file is present and if so, will reload the files and delete the marker-file. In the case of reloading an index, it creates a new instances of your CustomMapper() and replaces the previous mapper object with the new one. The changes are all in ru/ru_gen_boss_runs.py. Hope there won't be any conflicts with the changes you mentioned about not unblocking during mux scan phases? Thanks, Lukas

alexomics commented 1 year ago

Here is the mux scan check as a patch: phase_check.patch. You can apply this to your branch using:

git apply phase_check.patch

It might be best to move this to another sub-command e.g. boss-runs-asm or boss-runs-assembly if the way that it functions is distinct from the original boss-runs script. Are there any changes that you expect to the input configuration for readfish?

W-L commented 1 year ago

Thanks for the patch, I'll have a look asap.

The changes in the PR are all intended to be backwards-compatible, since this time we still reload masks the same way as before, just additionally enabling the reloading of a mapping index. If that functionality is not needed (i.e. original boss-runs) attempting to reload the index will just return, since the marker-file is never found. I thought this way is simpler than creating another version.

The input config for readfish gets an additional line. E.g.:

[conditions.1]
name = "zymo"
control = false
.
.
.
mask = "out_zymo/masks"           <- this was previously added to pick up the updated target masks
contigs = "out_zymo/contigs"      <- this is new to point readfish to where the updated index will be found

I just assumed that adding another keyword should work since we already had get_run_info(... validate=False) to hack the addition of mask=?

alexomics commented 1 year ago

That's all that good then. Without validate=True the contigs path will be fed through. Once the pausing during the mux is added we can merge this in

W-L commented 1 year ago

Thanks! I applied the patch, added some imports for it and a few other small bits from your recent commits to the dev_staging branch that looked useful. Should be fine, but also happy to give it a test run over the next days in case you prefer before merging?

Adoni5 commented 1 year ago

Gonna close this is as stale 😂