Closed romanchyla closed 6 years ago
updated; please note that if there are several pipelines writing into the same column - the datatype inside the column will have to reflect the provenance; which is doable - but something to keep in mind
and the timestamps, will not record processing times for the individual direct workers - which may or may not be a problem
Matt suggests the model have new columns for direct ingest (e.g., direct_data rather than arxiv_data, direct_created, etc.). This allows all the direct parsers to share the same schema. It makes sense to me.