ANMN METADATA harvester - missing variable attributes info

lbesnard commented 1 year ago

Following this issue https://github.com/aodn/PO-Backlog/issues/2665, I was trying to find all the ANMN files where the HEIGHT_ABOVE_SENSOR variable as an positve attribute equal to down so we can ask the facility to reprocess them.

I was hoping to find this information within the anmn_metadata schema, but it seems that not all files get their variable attributes harvested which is a shame:

select * from  anmn_metadata.variable_attribute 
   where file_id =
           (select id from  anmn_metadata.indexed_file 
            where url like '%/IMOS_ANMN-NSW_AETVZ_20150429T070000Z_BMP070_FV01_BMP070-1504-Sentinel-or-Monitor-Workhorse-ADCP-70.9_END-
 20150929T014959Z_C-20160901T023%')
+-----------+-------------+--------+--------+---------+
| file_id   | container   | name   | type   | value   |
|-----------+-------------+--------+--------+---------|
+-----------+-------------+--------+--------+---------+
SELECT 0

However global attributes are being ingested correctly

select * from  anmn_metadata.global_attribute where file_id = (select id from  anmn_metadata.indexed_file where url like '%/IMOS_ANMN-NSW_AETVZ_201504
 29T070000Z_BMP070_FV01_BMP070-1504-Sentinel-or-Monitor-Workhorse-ADCP-70.9_END-20150929T014959Z_C-20160901T023%')

@mhidas FYI

mhidas commented 1 year ago

I was hoping to find this information within the anmn_metadata schema, but it seems that not all files get their variable attributes harvested

Sorry @lbesnard , this was by design. Back when this harvester was first created (7 years ago!) I didn't think we would want to keep all variable attributes for all ANMN & DWM files in the database. In fact it only extracts the standard_name and long_name attributes, then clears the variable_attribute table after each file is harvested.

ggalibert commented 1 year ago

@lbesnard and @mhidas if that is OK with you I will close this since I don't think we are going to embark on trying to change this harvester given the start of the implementation of a new infrastructure. In the future infrastructure, I think we agree we should systematically harvest all attributes from netDF files with a generic harvester for example. @lbesnard you can still obtain the list of URLs of all relevant ADCPs and then query them one by one via OPeNDAP.

ggalibert commented 1 year ago

The pause label shows that this is relevant for the new infrastructure.

mhidas commented 1 year ago

:+1:

@lbesnard you can still obtain the list of URLs of all relevant ADCPs and then query them one by one via OPeNDAP.

Yeah that's probably the only solution. Use the anmn_velocity.anmn_velocity_timeseries_map view. You might

I would just use ncdump -h to look at the files via the /mnt/imos-data mount on pipeline-prod (instead of OPeNDAP). It'll take a while (there are 1186 files, unless you can narrow it down further based on the global attributes), but you only heed to do it once.

aodn / content

ANMN METADATA harvester - missing variable attributes info #516