edmcouncil / fibo

The Financial Industry Business Ontology (FIBO) defines the sets of things that are of interest in financial business applications and the ways that those things can relate to one another. In this way, FIBO can give meaning to any data (e.g., spreadsheets, relational databases, XML documents) that describe the business of finance.
https://spec.edmcouncil.org/fibo/
MIT License
307 stars 66 forks source link

Possible "broken" URLs in FIBO PROD literals #2018

Open mereolog opened 1 month ago

mereolog commented 1 month ago

The attached table collects 500+ triples from FIBO PROD (commit: dbdd526) where the object value is a URL that possibly does not refer (any more).

fibo_prod_possibly_broken_urls_dbdd526.xlsx

The check for these is not 100% precise, e.g., the links from the https://www.ffiec.gov domain like https://www.ffiec.gov/npw/Help/InstitutionTypes or https://www.ffiec.gov/nicpubweb/Content/DataDownload/NPW%20Data%20Dictionary.pdf, are classified as 403 although you can access them via a web browser.

For a great deal of other cases, but not for all of them, the 'http' schema should be updated to the 'https'.

ElisaKendall commented 1 month ago

@merelog - after discussion on the FIBO DER telecon this afternoon, we think you should exclude 403 errors. We did a spot check on those, and they resolved via a browser, as you mentioned, so that "doesn't count" as broken.

ElisaKendall commented 1 month ago

@merelog - also per our conversation on the FIBO DER telecon, please exclude all of the ones that are identified in the MarketsIndividuals ontology in FBC - they are provided by ISO and so we have no control over what they claim is the website of the exchange or other market participant.

mereolog commented 1 month ago

@merelog - after discussion on the FIBO DER telecon this afternoon, we think you should exclude 403 errors. We did a spot check on those, and they resolved via a browser, as you mentioned, so that "doesn't count" as broken.

I would advise that we might check what url the browser resolves, e.g., when you sent http://www.otcmarkets.com you are being redirected to https://www.otcmarkets.com, so you can interpret some of these 403s as warnings for updates in the schema.