Tensorboard and qiime2 - Githubissues

mortonjt commented 5 years ago

At the moment, tensorboard is disabled when running the qiime2 songbird command. The original motivation was because an additional logging directory needs to be created in order to enable this.

I'm not sure how qiime2 is expected to handle logging directories - should they be typed?

Thoughts @fedarko @ebolyen?

ebolyen commented 5 years ago

Is the goal to keep this directory around? Creating a well managed temporary directory is just fine from QIIME 2's perspective (dumping things into the current working directory or specific locations is usually where we take issue).

mortonjt commented 5 years ago

Right, the goal would be to create a logging directory that can be viewed through Tensorboard after the run is completed. This can help users debug their runs and compare have different runs differ (i.e. compare and contrast different covariates, priors, ...).

PR #71 currently creates this logging dir in the current directory (it has been disabled previously).

ebolyen commented 5 years ago

I see! We don't have an equivalent concept, but @thermokarst and I have been chatting recently about "diagnostic" outputs. This could help inform that design. From poking around, it looks like tensorboard is a server, and probably requests data from the logging directory as needed. So it doesn't fit into a full visualization on the surface.

fedarko commented 5 years ago

Does #71 (just writing to a logging directory, ignoring the standard notion of "output" in QIIME 2) have any actual downsides, aside from maybe not being the most elegant solution? Using future functionality baked in to QIIME 2 to handle this would be super nice, but IMO what's really important for now is making these diagnostics available to the user—to my understanding, this is pretty much required to make running Songbird (or mmvec?) useful, and while I can't speak on mmvec this isn't available when running qiime songbird right now.

One quick way to avoid the problem of dumping stuff into the current working directory (sort of :) might be adding a required --p-diagnostic-directory parameter (ideally specified relative to the CWD), and then having tensorflow output diagnostics to there. I recognize that this doesn't really fit with the current paradigm of having QIIME 2 take care of the I/O, but my vote would be strongly for a temporary solution like that instead of letting the problem sit.

ElDeveloper commented 5 years ago

In lack of a better framework-based solution. Would it make sense to only create this "diagnostic directory" under the current directory if the user activates the --verbose flag? The directory could be automatically named using the parameters for the model, and if --verbose is not used then no directory is created.

On (Sep-17-19| 9:05), Jamie Morton wrote:

At the moment, tensorboard is disabled when running the qiime2 songbird command. The original motivation was because an additional logging directory needs to be created in order to enable this.

I'm not sure how qiime2 is expected to handle logging directories - should they be typed?

Thoughts @fedarko @ebolyen?

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/biocore/songbird/issues/70

mortonjt commented 5 years ago

RE @fedarko / @ElDeveloper , that's basically what the standalone CLI is currently doing - if the user specifies a logging directory, it'll save Tensorboard files there. Otherwise, it'll automatically create a folding named by timestamp.

The main reason why I didn't push this into the q2 CLI this seems to go against q2 standards - all inputs / outputs need to be modularized in some shape or form.

Another work around is to have another output type (i.e. tensorboard type) that can be unzipped and visualized into Tensorboard. The diagnostics is just one big json - so it shouldn't be too big of an issue to just have a type for that.

Then it'll raise the question, where will this type live? Will this be in main q2_types? Or should this be in an auxiliary repo? This will be something that all packages using pytorch / Tensorflow will likely rely on.

fedarko commented 5 years ago

The diagnostics is just one big json - so it shouldn't be too big of an issue to just have a type for that.

Wait, really? In my experience running Songbird outside of QIIME 2, I always got a folder with 5 files (not counting the output differentials), and none of these files are JSON. (Attaching screenshot of what this looks like. The events.out.tfevent... file is by far the biggest file here, weighing in at 992 MB.)

If saving tensorboard stuff without doing it through Q2 is really really really not kosher, then an alternative would be not following through with #63 and instead improving the qiime songbird summarize-single command to make its visualization a reasonable-enough approximation of tensorboard (at minimum, adding like a sentence next to each plot that says "this is what this plot represents"—would probably be possible to straight-up copy this from the README's FAQ). This doesn't have to be fancy, but it would enable basic diagnostics in a Q2-compliant way.

Regardless of which solution people want (just writing TF results to a logging dir, making TF output a Q2 type somehow, improving the current summary command to produce a visualization with adequate labels and context), I'm happy to help with the corresponding implementation. Just let me know.

mortonjt commented 5 years ago

I misremembered the actual format -- that can be found here.

But yes, the events.out* contains all of the logging information -- the other files are for checkpointing (which we often don't need to use in songbird -- that's only for jobs that takes days to run).

Regarding your comment on the summary commands - that was the initial purpose. Do you have specific issues that you ran into? Feel free to raise issues so that we can start patching them.

fedarko commented 5 years ago

If we can hammer out #72 satisfactorily, then I think kicking the can down the road on getting Tensorboard + Q2 more tightly integrated is ok (...I suppose it's not really my call anyway :). Thanks all for the discussion, and sorry for the rants.

ebolyen commented 5 years ago

So I looked more closely at how tensorboard works, and while its not beyond the realm of possibility to static-ify the program (see https://github.com/tensorflow/tensorboard/issues/2045; there is a jupyter client, which expects a backend, but like described in the issue, we have a way to "cache" those requests in QZVs if we can change the host for the embedded client). That seems like a lot of work and is kind of brittle.

Pragmatically I am fine with a --p-diagnostic-dir or --p-tensorboard-log however supersystems (like Qiita) would probably prefer that their permanent filesystem is not be arbitrarily changed, which is why we so strongly discourage that kind of thing.

I am of course all for having nicer visualizations as well (I might recommend Vega here). Also, there's no reason we couldn't have our cake and eat it too, we could put the log directory into the visualization, so that savvy users can still find it, it's still attached to provenance, and a sufficiently advanced visualization can read directly from it in an ad-hoc way.

biocore / songbird

Tensorboard and qiime2 #70