gbif / portal-feedback

User feedback for the GBIF API, website and published data. You can ask questions here. 🗨❓
30 stars 16 forks source link

Show crawl history and current state #407

Open gbif-portal opened 7 years ago

gbif-portal commented 7 years ago

Show crawl history and current state

As a publisher I would like to understand if my dataset has been indexed fine or if there are issues that prevent it from being crawled.

A history of number of records over time and maybe issues would also be useful.


fbitem-dataset64dabd3c-4f34-4520-b9dd-d227a0bf1582 Reported by: mdoering@gbif.org System: Chrome 60.0.3112 / Mac OS X 10.12.5 Referer: https://www.gbif.org/dataset/64dabd3c-4f34-4520-b9dd-d227a0bf1582 Window size: width 1667 - height 886 API log&_a=(columns:!(request,response,clientip),filters:!(),index:'prod-varnish-',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'response:%3E499%20AND%20(request:%22%2F%2Fapi.gbif.org%22)')),sort:!('@timestamp',desc))&indexPattern=uat-varnish-&type=histogram) Site log&_a=(columns:!(request,response,clientip),filters:!(),index:'prod-varnish-',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'response:%3E399%20AND%20(request:%22%2F%2Fdemo.gbif.org%22)')),sort:!('@timestamp',desc))&indexPattern=uat-varnish-&type=histogram)

mdoering commented 7 years ago

issue has originally been asked by this German provider

thomasstjerne commented 7 years ago

Actually most of it can be seen here already: https://www.gbif.org/dataset/64dabd3c-4f34-4520-b9dd-d227a0bf1582#dataDescription

But I definitely agree that the reason for the 4 failed crawling attempts would be valuable information

thomasstjerne commented 7 years ago

See also https://github.com/gbif/portal16/issues/448

MortenHofft commented 7 years ago

The reasons for failed attempts are not available in the API is it? @mdoering is there information in the API that you believe we should expose? http://api.gbif.org/v1/dataset/64dabd3c-4f34-4520-b9dd-d227a0bf1582/process

A history of number of records over time and maybe issues would also be useful.

that is not information we have at this point is it? I can see how it would be useful. I'm labelling this as an API suggestion for now. Please correct me if the information is already there

mdoering commented 7 years ago

For records over time we have various numbers per crawl, e.g. fragmentsEmitted is what we find in the raw data. Maybe its worth showing this in a graph for the successful crawls?

For checklists we store these metrics for every version indexed. Just can't see how they are expose right now apart from the latest: https://api.gbif.org/v1/dataset/7ddf754f-d193-4cc9-b351-99906754a03b/metrics

MattBlissett commented 5 years ago

Some of this is available on the Management Tools: https://management-tools.gbif.org/ , and users whose email matches those listed for the dataset/installation/organization/node will see links from the dataset page to the relevant tools.