Closed kcmcleod closed 5 years ago
Ideally this page would replace the live deploys page.
Current draft: https://lxbisel.macs.hw.ac.uk:8080/EE-WebApp/stats
Not pretty. Needs sorting. Perhaps a pie chart? Possibly a list of top level URLS?
Probably want some high level stats, e.g. number of resources (databases), number of pages, number of pages by type. I'm not sure that the number of triples is that meaningful.
For the by type we may want to focus the list to Bioschemas types of interest, i.e. omit blog.
Probably no need to show the full URLs of types, just their hyperlinked names would suffice. Can we order the types by their name or even let the user dynamically sort by name or size.
We possibly want some ways of doing into which resources have been marked up with which types and then links to structured testing tool for an example page, similar to the live deploys page. We want to make it easy for people to get to the markup of a resource so they can copy and hack it to their needs.
We could also have some way of getting a list of all URLs that been indexed.
http://www.macs.hw.ac.uk/~ajg33
From: Ken McLeod notifications@github.com Sent: Thursday, June 6, 2019 3:00:46 PM To: HW-SWeL/Scraper Cc: Gray, Alasdair J G; Comment Subject: Re: [HW-SWeL/Scraper] Stats page (#13)
Current draft: https://lxbisel.macs.hw.ac.uk:8080/EE-WebApp/stats
Not pretty. Needs sorting. Perhaps a pie chart? Possibly a list of top level URLS?
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/HW-SWeL/Scraper/issues/13?email_source=notifications&email_token=AAIWUEN5F7NFDW2Z3DHHKC3PZEKA5A5CNFSM4HMYZYD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXC5XYA#issuecomment-499506144, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAIWUENDD4L6MEFQERVO3A3PZEKA5ANCNFSM4HMYZYDQ.
Heriot-Watt University is The Times & The Sunday Times International University of the Year 2018
Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With campuses and students across the entire globe we span the world, delivering innovation and educational excellence in business, engineering, design and the physical, social and life sciences. This email is generated from the Heriot-Watt University Group, which includes:
The contents (including any attachments) are confidential. If you are not the intended recipient of this e-mail, any disclosure, copying, distribution or use of its contents is strictly prohibited, and you should please notify the sender immediately and then delete it (including any attachments) from your system.
Stats page is returning an error
2019/09/03 09:16:08 [error] 3384#0: *2073 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 137.195.27.39, server: lxbisel.macs.hw.ac.uk, request: "GET /EE-WebApp/stats HTTP/1.1", upstream: "http://127.0.0.1:8081/EE-WebApp/stats", host: "lxbisel.macs.hw.ac.uk:8080", referrer: "https://github.com/HW-SWeL/Scraper/issues/13"
Stats page is returning an error
Need to pre calculate answers in a summary graph... todo!
For the by type we may want to focus the list to Bioschemas types of interest, i.e. omit blog.
Done.
Probably no need to show the full URLs of types, just their hyperlinked names would suffice
Done
Can we order the types by their name or even let the user dynamically sort by name or size.
Done: https://github.com/HW-SWeL/BSKgE/commit/0b19b2f73687c221e271325344b6963e5aa498c6
We possibly want some ways of doing into which resources have been marked up with which types and then links to structured testing tool for an example page, similar to the live deploys page. We want to make it easy for people to get to the markup of a resource so they can copy and hack it to their needs.
Search by type? Or do you mean something more complex? Making it easy to copy markup may have issues. Firstly you propagate junk. Secondly, copyright.
We could also have some way of getting a list of all URLs that been indexed.
If you mean sites OK. If you actually mean URLs I imagine that is way too slow...
We possibly want some ways of doing into which resources have been marked up with which types and then links to structured testing tool for an example page, similar to the live deploys page. We want to make it easy for people to get to the markup of a resource so they can copy and hack it to their needs.
Moving this into a new issue, as rest done
How many pages scraped? How many DataCatalogs? etc