goldman-gp-ebi / BOSS-RUNS

Dynamic, adaptive sampling during nanopore sequencing
GNU General Public License v3.0
26 stars 6 forks source link

Remote connection #3

Open cmfield opened 1 year ago

cmfield commented 1 year ago

We had issues establishing a remote connection to test/run bossruns - our device is not directly connected to an appropriate machine. After asking for help with readfish (because their tests also failed) I tracked down the issue to the read_until_api - you say to install v3.0.0, but that version is not capable of supporting remote hosts, it will only connect to localhost.

I am hoping there will be no issues down the line with using read_until_api v3.4.1 but if so, then a custom version of read_until_api may be required to execute bossruns remotely.

cmfield commented 1 year ago

One additional problem I am seeing with trying to do this remotely: when the fastq files are checked for, the Minknow output path is grabbed by bossruns.py, but then is assumed to be a path on the device it's running not, rather than the remote device. Now, I think I can trick it by synchronising paths between the minknow device and the server I am running bossruns.py on.. but it would be better if I didn't have to..

W-L commented 1 year ago

Hi, thanks for getting in touch! I don't anticipate any issues with using read_until_api v3.4.1, but equally have not tested it. We simply used the same version as readfish at the time of starting development on BOSS-RUNS. Since then I have not needed to test a newer version of the API, but if it is backwards compatible, there should not be any issues. It'd be great if you could let me know whether it works for you. I'll then run a quick test and increment the dependency version.

Concerning the issue with the output path: how is your connection with the remote, do you have the filesystem mounted with sshfs or something similar? A straightforward hack I could imagine would be to add an argument to BOSS-RUNS that lets you provide an output path where it should look for the fastq files instead of grabbing that with the API. Do you think that would solve your issue? The problem with this is that the output directories are usually only created by MinKNOW after the run has started, so it's not very convenient having to edit configuration files or arguments after starting to sequence.

cmfield commented 1 year ago

I don't think v3.4.1 caused any issues, although we did get a mysterious behaviour in a 72 hour run. Everything was fine after about 24 hours, so we left it running over the weekend. In this time, it didn't crash, but.. either it got stuck calculating something for two days, or it froze, because it stopped checking and updating the masks for that period, and there's nothing in the log. It's rather baffling in fact.

To solve the output path issue, I set up an rsync loop between the sequencer and the server we were running off of, so it wasn't a complete barrier. I guess it would be good to be able to do that without mirroring the exact directory structure - so giving a path as you suggest, since remote accessing the files probably isn't trivial to set up with the extra password requirements.

I was also going to ask how the reference-free development was coming.. it seems like this would have to be done from scratch without readfish since it doesn't allow you to anti-target (filter?) based on a reference.. and I wasn't sure what the best approach would be for determining what had been seen already.