Closed cpritcha closed 8 years ago
I'm still not sure I understand the need for the ProjectPath model. Internally after the user has confirmed the appropriate partitioning of datasets into DataTableGroups, we could store all the data files associated with a given DataTableGroup in a single folder and expose them as needed to DeployR / Radiant..
ProjectPath
TableDataTable
s are Many to ManyA shapefile consists of a .dbf
, a .shp
a .prj
and potentially some other file formats (.shx
). This would mean that one DataTable
has many paths. Some raster formats also consist of many files.
A sqlite database consists of a single file but contains many tables so in the metadata database that single path would consist of multiple DataTable
s and
DataTableGroups
Since we are only dealing with csv files at the moment this could be handled later.
When a user enters metadata they need to know what metadata they have to enter. With a paths table (that lists all the files in the project) this is relatively easy. We can select all the paths in the paths table that are not referenced by a DataTable
or Analysis
. It is also easy to ignore paths so the user is not pestered to complete metadata about a file path that should not have any.
If a user decides to split a DataTableGroup
into pieces no special functionality is needed. Just create a DataTableGroup
without any DataTable
s, move some DataTable
s to the DataTableGroup
and either manually enter metadata or extract metadata from one of the DataTable
s. No special functionality for splitting a DataTable
either. Just create a new DataTable
and move any relevant ProjectPath
s to it.
Without a paths table I see two possibilities.
pending
status.DataTableGroup
(because the metadata extractor did not group the files correctly) this would involve creating a new DataTableGroup
and changing the DataTableGroup
foreign key on some of the DataTable
s to point to the new DataTableGroup
. Splitting DataTable
s is a non issue because it is not possible for a DataTable
to have many paths (which will be limiting if want to support shapefiles or any format where multiple files form a unit).DataTable
and Analysis
tables). This still does not solve the issue of whether or not a particular path should be ignored but an ignored file list could be kept in a separate table.DataTableGroup
and DataTable
is identical to the approach taken when using a ProjectPath
table (although diffing between the filesystem
paths and the database paths is done on demand).If ProjectPath
is used to store all paths in the project, I think it is needed. We need a way to identify the DataTableGroup
for each individual file (and thus identify the proper metadata), and it may not be feasible to group them together into the same folder (file name collision for example, if they are not pre-grouped by users)
Ok, you both make a good case for keeping track of the actual files belonging to a DataTableGroup. Let's discuss some potential refactoring of this over the call this morning.
Corrected gitsubmodule to point to correct dependency Updated docker-compose with new paths Removed old test project structure skeleton Added Basic metadata upload support
Still to do:
serializers.py
andurls.py
)