DIGGSml / Geophysics

A repository for resources and development of DIGGS schema extensions for geophysics
0 stars 0 forks source link

General Topic: Developing metadata objects to support transfer of large external files #3

Open dponti opened 1 year ago

dponti commented 1 year ago

It may not be efficient to transfer data acquisition results for some techniques as ASCII data within the xml, especially where data acquisition software stores its data in binary format. One of DIGGS' primary goal is to allow an end user to be able to have enough information about the measurement to assess the efficacy of the final result and/or be able to reprocess the raw data if need be.

Given that some approaches typically generate large data files during acquisition, how do we best allow users to access this data?

Are there standard formats we should reference or support?

Can Seg-Y be used for universal binary transfer as opposed to proprietary formats?

nickmachairas commented 1 year ago

AVRO and Parquet are popular file formats for large datasets. SEG-Y seems to be geared towards seismic data only, I don't know much about it.

How large is a large data file that warrants looking into alternatives? A 500MB XML file would be a problem some time ago but nowadays most laptops can easily parse files of that size.

Are there issues other than compute and transfer with large files?