Closed michaelhoehn closed 3 years ago
Source
is a property of the Dataset, therefore it should be specified at the moment of the creation of the Dataset object, rather than anywhere else. I don't think it makes sense to detach it from it, as it would not apply to all files that you can export with the File_Toolkit.
File_Toolkit is intended to simply create Files by serializing their contents. The only reason we have a UseDatasetSerialization
option in the PushConfig for now is that the Datasets are "improperly" serialized: although they end up in .json files, they do not respect JSON conventions (single objects are serialized into proper JSON, then they are concatenated without any comma; then, we are missing square brackets around the list of objects, to compose a proper JSON array). The only thing that this option does is tell the serializer to do the weird serialization that we use for datasets, instead of a proper JSON serialization. When in the future we make datasets properly JSON, then the option UseDatasetSerialization
will be removed.
Now that I think of it, I can already remove "UseDatasetSerialization" option: instead, I can simply check what type of "content" is specified in the file. If a Dataset
object is determined, then the "improper" JSON serialization will be used.
Makes sense to me. Thanks @alelom 😃
Is it worth elevating the importance of Source by adding it to the PushConfig, or as a property of the File object, especially when
UseDatasetSerialization
== True?I felt I had to package things up in the Dataset object prior to pushing in order to retain the Source information needed for standard datasets. It may be a broader question, but I felt that the purpose of compiling a dataset was to maintain source fidelity along with organisation. So it seems to make sense that if the user selects to UseDatasetSerialization that they would benefit from either a prompt to add Source, or simply add Source as a property to the file itself.