glotzerlab / signac

Manage large and heterogeneous data spaces on the file system.
https://signac.io/
BSD 3-Clause "New" or "Revised" License
130 stars 36 forks source link

Replace "data space" in docstrings with new definitions #809

Open cbkerr opened 2 years ago

cbkerr commented 2 years ago

Description

It's an ill-defined term.

What are the differences in semantics between "project workspace" and "project data space"? Or are they synonyms?

@joaander I've found 2 definitions of "data space" in the docs:

Synonymous: all files stored in the jobs in the workspace of the project - (source: Project.create_linked_view)

Not synonymous: the abstract set of all initalized state points (source: Concepts Page) This is in the process of being fixed in https://github.com/glotzerlab/signac-docs/issues/120. The following link implies that the data space can be changed by changing the value of a state point: https://docs.signac.io/en/latest/examples/notebooks/signac_104_Modifying_the_Data_Space.html This contrasts with the first definition because changing the state point (and therefore the job id) does not remove the files stored in that job directory, and therefore wouldn't change a linked view (like what you get from the signac view command) https://github.com/glotzerlab/signac/issues/743#issuecomment-1099773058

How to fix

Replace with "workspace" if it means the files in the workspace of a project.

Replace with ???? if it refers to the abstract idea of all data that could go in a project.

For context, see:

vyasr commented 2 years ago

@cbkerr have you given this issue any further thought?

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

cbkerr commented 1 year ago

Definition of data space in project.rst: "the underlying data generated and manipulated by these operations."

"A signac project is a conceptual entity consisting of three components:

(1) a data space, (2) scripts and routines that operate on that space, and (3) the project’s documentation.

This division corresponds largely to the definition of a computational project outlined by Wilson et al. The primary function of signac is to provide a single interface between component (2), the scripts encapsulating the project logic, and component (1), the underlying data generated and manipulated by these operations."

I found an implied definition in jobs.rst

In general an instance of :py:class:Job only gives you a handle to a Python object. To create the underlying workspace directory and thus make the job part of the data space, you must initialize it."