Ingest of CSV file - Githubissues

falkamelung commented 1 month ago

Below example CSV/XLS file we want to ingest. Im not sure what is better, CSV or XLS. XLS is handy because you can open and modify in a spreadsheet. I added the metadata manually. Are these the critical metadata for the ingest to work? This file is produced by a different software (sarvey), which starts with miaplpy data products. I am just getting started with this. Once we have decided about the format and confirm that the ingest works I will create a python script to generate these files as part of the sarvey workflow.

The key parameter which we were not able to properly examine is the estimated elevation. If it agrees with the real elevation that means that this is a reliable pixel. I probably will add another column lidar_elevation. If that exist it should display it as well.

Here the "needed" attributes in hdf5*_2_json_mbtiles.py. Many of them don't seem critical. Can we just say unknown for now? I will add add them once I fine them. But it will be good to make it work with as few needed data as possible.

needed_attributes = {
    "prf", "first_date", "mission", "WIDTH", "X_STEP", "processing_software",
    "wavelength", "processing_type", "beam_swath", "Y_FIRST", "look_direction",
    "flight_direction", "last_frame", "post_processing_method", "min_baseline_perp"
    "unwrap_method", "relative_orbit", "beam_mode", "LENGTH", "max_baseline_perp",
    "X_FIRST", "atmos_correct_method", "last_date", "first_frame", "frame", "Y_STEP", "history",
    "scene_footprint", "data_footprint", "downloadUnavcoUrl", "referencePdfUrl", "areaName", "referenceText",
    "REF_LAT", "REF_LON", "CENTER_LINE_UTC", "insarmaps_download_flag", "mintpy.subset.lalo"
}

SunnyIslesSenA48_20190101-20233110.csv SunnyIslesSenA48_20190101-20233110.xlsx

stackTom commented 1 month ago

I'm confused. So we need to ingest csv files on top of h5 files now? Can't this data just be put inside the h5 files as extra attributes? We already have a system for ingesting extra attributes.

falkamelung commented 1 month ago

It will be good to have the ability to ingest csv. The alternative is to convert a csv into an HDF5EOS, but this is not smart as nobody uses HDF5EOS. But I can do this myself. Your time is better used for InSARmaps. When adding the checks to hdf*2json_mbtiles we just should keep this in mind.

I am just not sure what is better. Create a new ingest script (csv_2json_mbtiles.py) or add a --csv option to the current script. I think I prefer the second, even though the name is messed up.

stackTom commented 1 month ago

The second is probably better. I am surprised csv's are frequently used. See my reply here https://github.com/geodesymiami/insarmaps/issues/117#issuecomment-2425220900 H5 files seem much better suited for containing this large amount of data than csv files which are super rudimentary and inefficient.

stackTom commented 1 month ago

Here the "needed" attributes in hdf5*_2_json_mbtiles.py. Many of them don't seem critical. Can we just say unknown for now? I will add add them once I fine them. But it will be good to make it work with as few needed data as possible.
needed_attributes = {

    "prf", "first_date", "mission", "WIDTH", "X_STEP", "processing_software",

    "wavelength", "processing_type", "beam_swath", "Y_FIRST", "look_direction",

    "flight_direction", "last_frame", "post_processing_method", "min_baseline_perp"

    "unwrap_method", "relative_orbit", "beam_mode", "LENGTH", "max_baseline_perp",

    "X_FIRST", "atmos_correct_method", "last_date", "first_frame", "frame", "Y_STEP", "history",

    "scene_footprint", "data_footprint", "downloadUnavcoUrl", "referencePdfUrl", "areaName", "referenceText",

    "REF_LAT", "REF_LON", "CENTER_LINE_UTC", "insarmaps_download_flag", "mintpy.subset.lalo"

}
SunnyIslesSenA48_20190101-20233110.csv

SunnyIslesSenA48_20190101-20233110.xlsx

Here "needed" doesn't mean "critical or necessary". At the time, I meant "these are the ones we should have on the database on the site". Ambiguous naming, I know.

Off the top of my head, some of the critical or necessary ones are scene and data footprint. Otherwise the site has no way of showing the swaths... areaName might also be critical. I will just not display the ones missing this info so the site doesn't crash. Just a little confused why some ingests are missing this info now when they haven't been for the past 7-8 years

falkamelung commented 1 month ago

Yes, so maybe just separate into needed_attributes and optional_attributes. If a needed one is missing it exits.

geodesymiami / insarmaps

Ingest of CSV file #118