apecs-org / Polar-EO-Database

Polar Earth Observation Database of satellite sensors
GNU General Public License v3.0
21 stars 5 forks source link

Researcher-dependent VS researcher-independent information #40

Open Azamattf opened 1 year ago

Azamattf commented 1 year ago

Necessity

I think we should divide information we receive from the community into two:

  1. User-dependent information: Information about how researchers use the data

    • Scientific field
    • Physical variables derived
    • Software used
    • Region/object of study
    • etc
  2. User-independent information: All data related to tech specs of the data set/sensor.

    • Sensor name and type
    • Satellite active period
    • Temporal and regional coverage
    • Temporal and regional resolution
    • Data access platform
    • Data accessibility (open access/commercial)
    • etc.

Why is it important?

As we collect info about tech specs of the sensor and validate them, they are not subject to change, unless somebody spots a typo or a mistake. On the other hand, scientific application is user- or researcher-specific and one data set can have various applications. Therefore, we can work with 2 types of files: 1) [data_set_name]_techspecs - for one sensor we would have only one file (like we have currently) 2) [data_set_name]_application_001, [data_set_name]_application_002 - for one sensor we would have multiple files for applications.

This separation would also help us:

In the end, the code would compile the third type of file for each dataset/sensor with all information - one file for each sensor, probably called [data_set_name]_index. All these index files can then be sent to Google Sheet.

What do you guys think?

AdrienWehrle commented 1 year ago

Hi @Azamattf! Thanks for your work on this!

I'm not sure I see the benefit in this, compared to the complexity it is adding. In your opinion, what would be the main gain of integrating such a structure? To me all fields are similar at the end, some will change more than others but I don't see an issue in this.

Azamattf commented 1 year ago

Hi @AdrienWehrle, thanks for the reply. As we know, our database is not the only one that is aimed to collect information about satellite datasets in the world. But our project is likely to be unique because we are emphasizing on the applications of datasets. Also, the scientific applications will most likely to be different for each user/researcher. On the other hand, the tech specs of sensors are the same for everyone (user-independent).

For ECRs: considering this, like I mentioned in my post, the proposed structure would address our objective of giving ECRs a tool to help them with both tech specs and aggregate information on scientific applications (see the original post).

For Contributors: I think it would be inefficient to gather information about tech specs each time a Contributor submits a template-based information because such info is user-independent. I think user-independent info shouldn't be collected all the time because it could be waste of time for the Contributor and the project team (the team is still supposed to verify the sensor tech specs and does it once only). A Contributor may want to add new information either about tech specs (if not included in database already) or about applications or both.

Hope that helps :)