PyWorkflowApp / visual-programming

A Python Visual Programming Workspace for Data Science
MIT License
31 stars 12 forks source link

File naming/access in `pyworkflow` vs. Django vs. CLI #43

Closed reddigari closed 4 years ago

reddigari commented 4 years ago

Realized #42 was not the right place for this discussion, but this is a follow up on the comment I left there:

This is related to the first bug I worked around in #41 --- pyworkflow nodes need to know the absolute path of any files on disk, which were probably created with Django file storage. Obviously one fix (currently being used) is to use that API inside pyworkflow, but that's a separation of concerns issue (that would prevent us from using pyworkflow without the webserver entirely). My workaround in https://github.com/matthew-t-smith/visual-programming/commit/21ae095d428d4106aecca535d761bdb296a662d1 determines the absolute path from within the Django app, and passes it into the pyworkflow node factory. But I'm not sure whether that will work for all cases.

When we do deal with the CLI --- we could inject some filesystem abstraction into the pyworkflow.Workflow constructor. From the web app it would be Django's FileSystemStorage object, and for the CLI, it could be some custom wrapper of the user's filesystem (mounted in the Docker container? idk.) Then all files can be referred to by their with basenames, and pyworkflow can look them up in whatever filesystem it was given.

However, I'd say this complication is a point in favor of Diego's idea of having the CLI be a client of the Django app rather than of pyworkflow directly --- then everything can be done through the Django file storage API (although how the user will "upload" their data files when running in CLI mode is going to be tricky).

reelmatt commented 4 years ago

I also agree with Diego's idea of having the CLI be a client of the Django app with files handled by the FileSystemStorage. But I also liked your idea Samir of moving the file system into the Workflow constructor since this can condense the number of places we have to deal with the fs object: define it once in the Workflow and then use the Workflow for all file operations/lookup.

reddigari commented 4 years ago

I'm still reading through the PR and I don't have a fully-formed opinion yet, but I actually meant moving FileSystemStorage entirely OUT of pyworkflow rather than entirely in. It's doesn't appear to be designed to work outside of a configured Django app (obviously not a problem because we are using pyworkflow inside a Django app), but it means pyworkflow will never stand alone. Again, that's not necessarily an issue for us, but it means we:

These aren't deal breakers, and if it's the quickest way to meet our requirements, I'm totally down with it. But if we're looking to keep the Django concerns in the Django app, is there any downside to just passing a prefix path into the Workflow object rather than passing or creating a FileSystemStorage instance?

reelmatt commented 4 years ago

That does make more sense and I think is compatible with the latest PR (after some more changes).

In #47, a Workflow is created roughly like Workflow(file_system=FileSystemStorage()). Then, Workflow methods use this by self._file_system.open().

As you mention, passing a prefix path is an option (that I don't see any downsides with). For the Django use case we could do this simply by Workflow(file_system=fs.location) or even Workflow(file_system=settings.MEDIA_ROOT) to avoid FileSystemStorage altogether. For a CLI/other use case we could do something like Workflow(file_system='/root_dir_here'). Then, Workflow methods would make standard Python I/O calls something like

file_path = os.path.join(self._file_system, file_to_open)
with open(file_path, 'r') as f:
    ...

with some polishing of course. I like that a lot better actually.

matthew-t-smith commented 4 years ago

Just catching up here - sounds like y'all have a good direction. If we ended up needing more than this file_system, an idea could be to factor out into a Context class instance that gets passed around. Probably too heavy for just this case, but just thinking out loud.