datacommonsorg / import

Tools and pipelines for importing data into the Data Commons Knowledge Graph.
Apache License 2.0
4 stars 20 forks source link

Option to drop warnings for reference checks for properties with known Text values #66

Open ajaits opened 2 years ago

ajaits commented 2 years ago

The UNEnergy data set uses measurementMethod: UNStatsEstimate in the statVarObs mcf.

The dc-import generates a warning for this: { "level": "LEVEL_WARNING", "location": { "file": "un_energy_output.csv", "lineNumber": "4" }, "userMessage": "Failed existence check :: reference: 'UNStatsEstimate', property: 'measurementMethod', node: 'E:UNEnergy->E0'", "counterKey": "Existence_MissingValueRef_measurementMethod" },

measurementMethod allows Text in rangeIncludes rangeIncludes: Text, Enumeration

Can we have a flag to ignore specific warnings such as reference checks for know values, so we can easily catch other warnings/errors in the report.json?

pradh commented 2 years ago

I think we want measurementMethod to refer to entities so we can describe the method used, etc. Thus, I feel like this check as such is useful to have, and would also agree to removing Text from its rangeIncludes.

Adding an option to ignore certains warnings is something we can do. Would we refer to the specific warning using counter name, like "Existence_MissingValueRef_measurementMethod" ?

ajaits commented 2 years ago

On Thu, 16 Sept 2021 at 07:24, Prashanth R @.***> wrote:

I think we want measurementMethod to refer to entities so we can describe the method used, etc. Thus, I feel like this check as such is useful to have, and would also agree to removing Text from its rangeIncludes.

sg. measurementMethod has rangeIncludes Enum. Should we convert the existing text ones into Enum? Is there a broader Enum for measurements?

Adding an option to remove warnings is something we can do. Would we refer to the specific warning using counter name, like "Existence_MissingValueRef_measurementMethod" ?

Yes, the counter name or a flag to skip reference check warnings for specific new entities that are being added in a different change to the schema.

Does the dc-import lint use APIs from the autopush or the prod?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/datacommonsorg/import/issues/66#issuecomment-920513860, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZASMRZRLIKBOYXAXNBPTUCFE7DANCNFSM5EAICTAQ .

pradh commented 2 years ago

Should we convert the existing text ones into Enum?

What's an example of existing text one? The infra I thought would treat mmethod value as a reference.

Is there a broader Enum for measurements?

https://github.com/datacommonsorg/schema/blob/main/core/measurement_methods.mcf#L7-L11

Does the dc-import lint use APIs from the autopush or the prod?

It uses staging (to not impact prod). I've been considering using autopush to get most recent build, but the instance gets restarted on every code change, so was worried about reliability...

ajaits commented 2 years ago

Foe the un energy, I used UNStatsEstimate as the measurement method for a subset of values marked as estimates in the input CSV.

I haven't defined this as an enum. I'll add this to the measurement method enum you pointed.

On Thu 16 Sep, 2021, 8:22 PM Prashanth R, @.***> wrote:

Should we convert the existing text ones into Enum?

What's an example of existing text one? The infra I thought would treat mmethod value as a reference.

Is there a broader Enum for measurements?

https://github.com/datacommonsorg/schema/blob/main/core/measurement_methods.mcf#L7-L11

Does the dc-import lint use APIs from the autopush or the prod?

It uses staging (to not impact prod). I've been considering using autopush to get most recent build, but the instance gets restarted on every code change, so was worried about reliability...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/datacommonsorg/import/issues/66#issuecomment-920973924, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZASJLLX5N3YBCJCBCMTTUCIAELANCNFSM5EAICTAQ .