buda-base / lds-queries

A repository for BUDA Linked Data Server
Apache License 2.0
0 stars 1 forks source link

queries for the librarians #85

Open eroux opened 2 years ago

eroux commented 2 years ago

per @karmagongde 's request:

karmagongde commented 1 year ago

Hi @eroux ,

When you get time, can you please create the above 4 queries that can work on BUDA?

Since cataloging and scan requests are creating on BUDA Editor I need queries to generate CSV from BUDA. Thanks.

eroux commented 1 year ago

right, sorry for the delay! Just a quick question, how will you use the outline nodes query? Is it just for counting the nodes?

also do you have an example of archive report? The link doesn't seem to be working

karmagongde commented 1 year ago

Here is CSV example for W1KG4334 that create from the outline nodes countings. texts-in-W1KG4334.csv

eroux commented 1 year ago

Thanks! Do you use the result for anything other than counting?

karmagongde commented 1 year ago

Yes, this result is helpful for checking the missing titles and pagination mistakes.

eroux commented 1 year ago

Ok! Do you have an example output for the 3rd query?

karmagongde commented 1 year ago

This is a large file that included all the archives info. Here is an example: archive-2022-02-15T13 05 19.707Z.csv

eroux commented 1 year ago

oh, ok, how is it used?

karmagongde commented 1 year ago

This is very useful to check any kind of status. 1. images never arrived, outline created or not etc.

eroux commented 1 year ago

this full export would be very hard to do (although not impossible). I can export a subset that would include those fields you mention

eroux commented 1 year ago

https://github.com/buda-base/archive-ops/issues/857 is very connected to this

eroux commented 1 year ago

Just documenting something odd: as I'm starting the query for the scan requests, I compared the results I had with the results from tbrc.org, and some results from tbrc.org look a bit buggy, for instance in https://www.tbrc.org/public?module=archive&query=scanrequests&args=04%7C2015 (for the month 04|2015) contains a row for W2PD17369, which I can't return in the results on BUDA. What happens is that the W record was created in June, but there's a previous scan request object from April. The scan request objects from tbrc.org are not migrated on BUDA, and the query only looks at the log entries saying added volumeMap for scan request (in that case the entry is from June). So there may be a few discrepancies like that, but that should be relatively insignificant and it probably only impacts old data (this is from 8 years ago)

eroux commented 1 year ago

@karmagongde what do you think of

https://purl.bdrc.io/query/table/scanrequests_time?D_FROM=2015-04-01T00:00:00Z&D_TO=2015-05-01T00:00:00Z&format=csv&pageSize=5000

? This is for April 2015, like the exampe above

karmagongde commented 1 year ago

@eroux This URL for generating the list of scan requests for the desired month is working perfectly as I needed. Thank you.

eroux commented 1 year ago

great! I've transformed the title into Unicode. I've also added the one about syncs:

https://purl.bdrc.io/query/table/syncs_time?D_FROM=2023-03-01T00:00:00Z&D_TO=2023-04-01T00:00:00Z&format=csv&pageSize=5000

karmagongde commented 1 year ago

Syncs URL is fine except "igc" column with multi volumes. For instance: bdr:W1KG16002, བོད་ཀྱི་ཚན་རིག་དུས་དེབ།, 2800 (actual volume no. is 200.)

eroux commented 1 year ago

right, thanks for spotting it! just fixed it