Open steppi opened 3 years ago
I agree that if pystow is used as a dependency of some package that someone installs, it might not be clear what it is and what the .data
folder is. I wonder if instead of naming it .pystow_data
which assumes someone knows what pystow is, naming it simply .pydata
would at least convey the fact that "this is a data folder for Python packages".
I've never used a package that created a .data
folder, so I'm curious if you guys or anyone else has an example of another package that's doing this that would create the conflict.
I don't want to make the name python-specific nor pystow-specific because the concept transcends languages. I have actually been planning to write an R port of this that should have a similar interface (and potentially have a higher impact, since R users are terrible with reproducibility).
There are so many application-specific folders littering the home directory now that I'd rather keep this one generic as a reminder that it should supersede the other ones.
I'm not aware of any such packages either. My thinking was based on the sheer scale of the number of software and data professionals in the world, as well as hobbyists, and the impossibility of knowing if any are using a .data
folder, especially for the case of custom/bespoke workflows that wouldn't be public knowledge. That it is such generic name, and that it seemed like an obvious choice to you, makes it seem at least plausible to me that could also be an obvious choice for someone else.
You bring up a good point though. If you want to create something that is not pystow or language specific, then a somewhat generic name is in order. In this case, I think you need to start thinking about branding. If you want this thing in its language agnostic form to gain mindshare, maybe it should have a pithy name to help brand it. In any case, I've updated adeft to use appdirs to place the models in the platform specific user data location (which I think is a better fit for adeft), so I no longer feel personally responsible here.
I agree with having the layout transcend language or a particular implementation, but there are a set of conventions at play here, so I think having having some more specific name is warranted, and I agree with @steppi that it's likely that its not unlikely other applications will choose .data
.
Can someone point to a concrete example of another application (any platform/language) that’s using the .data directory in the home folder?
If that’s really an issue, there are several ways to configure where pystow uses its home directory both by specifying it explicitly or by falling back to the xgd standard
Can someone point to a concrete example of another application (any platform/language) that’s using the .data directory in the home folder?
I’m not aware of any examples. To summarize my thoughts.
Even if the frequency of clashes is very small, that a clash could lead to catastrophic results for a user is enough to scare me away from using pystow in one of my own packages. Absence of evidence isn’t necessarily evidence of absence, and that I’m not aware of any applications using a .data directory doesn’t make me feel secure given my complete ignorance of the bespoke workflows that are used in different teams/groups/labs.
That said, I don’t intend to push any further on this and hope I don’t come off as too aggressive.
Whatever the outcome of this discussion, CLIs depending on pystow MUST provide a way to change the location. I have had now cases where using, say, OAK with ODK, where all the processing outside PWD will be lost after the run (because the process, say an ontology query) is running inside the docker container. In fact, its possible that the caller does not have write right outside PWD at all, which needs to be considered.
I noticed in the docs, that the location of the data is configurable:
If you want to use an alternate folder name to .data inside the home directory, you can set the PYSTOW_NAME environment variable. For example, if you set PYSTOW_NAME=mydata, then the following code for the pykeen app will create the $HOME/mydata/pykeen/ directory
Pystow does not check if there is an existing .data directory on the users system and happily commandeers this folder even if it already exists. Since this is a very common and generic name, it is not unlikely that a user may already have a .data directory in their home folder. It is also not unlikely that a user will end up in a situation where they or some software they have installed other than pystow will try to place a .data directory in their home folder. I suggest to change to a less generic name such as ".pystow_data" to avoid potential naming conflicts. I think just having the possibility of changing the default folder name with an environment variable is insufficient because the direct users of pystow are python package developers not python package users. We should seek to minimize any cognitive burden or sources of surprise for end users of python packages that use pystow.