AssessingSolar / solarstations

A catalog of high-quality solar radiation monitoring stations.
https://SolarStations.org
BSD 3-Clause "New" or "Revised" License
12 stars 4 forks source link

Metadata formatting #111

Open AdamRJensen opened 7 months ago

AdamRJensen commented 7 months ago

I've opened this PR to discuss how the metadata should be specified.

Specifically, the three columns: Tier, Instrument, Components. Right now, the Instrument column denotes either "Thermopile" or SPN1/RSR, and the the thermopile components measured are specified in the components columns (e.g., G;B;D). This is somewhat inconsistent as SPN1 also uses thermopiles...

Perhaps these two columns could be combined into one, e.g., a station with an RSR and a traditional unshaded pyranometer (GHI) and pyrheliometer (DNI) would have an entry in the Instrumentation column of G;B;RSR.

Also, given the definition of Tier 1 vs 2, it should be possible to determine if a station is Tier 1 by checking if G;B;D is listed in the components column. So perhaps the Tier column should be derived and not be hardcoded. Thoughts?

AdamRJensen commented 7 months ago

As per @IoannisSifnaios's comment:

I think it is a good idea to add a third status called "unknown". E.g., I could not find info if the Saudi Arabian or the Chilean network is still in operation, and by default they would be categorized as "active". However, that might require going over the stations again to make sure that they are in the correct category...

I think that is a good idea. Perhaps we can use a question mark? E.g., 2012-2014&2018-? Thoughts @kandersolar?

kandersolar commented 7 months ago

An alternative approach: indicate instrumentation type and component presence/absence in a combined form. Something like this:

Station DNI DHI GHI
Stn 1 pyrheliometer shadowball + thermopile thermopile
Stn 2 N/A SPN1 SPN1
AdamRJensen commented 7 months ago

Your example still contains some duplicate information (not that that is necessarily a problem). My main issue is that it limits denoting additional measurements, e.g., UV and IR as is sometimes done (though I suppose this could be put in the comments).

My favor is still to have one instrumentation column like this:

Station Intruments
Stn 1 G, B, SPN1, IR
Stn 2 G, D

@kandersolar thoughts about using a question mark for the dates?

Tier will then be determined as following:

    for row in stations:
        instruments = row['Instruments'].split(';')
        if ('G' in instruments') & ('B' in instruments') & ('D' in instruments):
            row['Tier'] = 1
        else:
            row['Tier'] = 2

Additionally, diffuse irradiance measurements with a shadow band need to be marked differently than if they are mounted on a tracker. D will be used for tracker mounted irradiance measurements (highest quality) and Ds for diffuse irradiance measurements with a shadowband or shadowring.

kandersolar commented 7 months ago

Question mark indicating uncertain end date makes sense to me!