Closed RichardBruskiewich closed 3 years ago
Overall, KGE File Set compilation is now largely managed by the KgeaArchiveCatalog
class in the kgea.session.catalog.Catalog
module. Further iterations will refine.
@RichardBruskiewich will confer with a few Translator collaborators regarding their practical preferences for download archive structure and behaviour
We have converged on building and using a single normalized Tar.Gz archive for downloading.
If files are uploaded in small chunks, how does the system cross-link and connect the data?
A partial answer is that the core “Register File Set” web site page is the control point which initiates a given file upload session, which aggregates all files uploaded during one session into one “KGE File Set”. The upload page has a “Done Uploading File Set” button for closing that session. The application back end manages the archive integration of the uploaded datasets (with metadata) into a single file set.
That said, a few questions remain:
Since AWS S3 doesn't severely limit file sizes (at least, on the order of the anticipated knowledge graph dataset sizes), should incoming data be merged into standard file components, perhaps renamed, for greater back end uniformity of access?
Could the upload "Data File" upload have radio buttons to select type of file being uploaded, e.g. "node file", "edge file" etc. as a hint to the system about integration? - DONE
How should such large data be published to clients wishing to download it: in its original form (with the original file names) or in its "consolidated" form (normalized into a consolidate
nodes.tsv
andedges.tsv
in an tar.gz archive).Should/could downloading be a "streaming" form of transaction or a discrete file download(?). Sort of solved as a workable download from the web. Q: How might this change for a CLI or program library solution?