Open sladen opened 4 years ago
According to the Instructions provided bei Stefan (Wü), Hiwis were asked to, after downloading from BSB, remove "GibtEsBeiBSB" and add "BSBDownloadFertig".
So books with "BSBDownloadFertig" should have OCR already, "GibtEsBeiBSB" not.
So, I did a small check on all of this. I checked the following cases and got following feedback:
I hope this was the information that was requested by you two. The first part "inBSB_noOCR" is from a previous issue.
I will upload the files here. They are also automatically created with every server-start and can be found at /etc/clas-digital-devel/[filename].txt
(filename is the first element in the list above, f.e. "inBSB_noOCR")
BSBDownLoadFertig_noOCR.txt -> 0 books gibtEsBeiBSB_noOCR.txt -> about 145 books gibtEsBeiBSB_OCR.txt -> about 3 books inBSB_noOcr.txt -> about 315
I think on average this is all expected behavior, apart from the three books in the category "gibtEsBeiBSB_OCR" here probably the tag need to be changed in ocr.
All files present the zotero key right away for easy double checking, but also author, title and year.
Took care of gibtEsBeiBSB_OCR.txt Please leave this issue open as PaulM and I will be working on gibtEsBeiBSB_noOCR.txt
Can I please have current versions of
This is what we have right now: gibtEsBeiBSB_noOCR.txt inBSB_noOcr.txt
Thank you, but these are not correct/up to date. I just randomly checked three cases from gibtEsBeiBSB_noOCR.txt which I remember having uploaded already, and indeed, they have scans on the server! (UR2BG6KI, JUUSJMEX, QKUVL5X8). There were about 145 cases, now there are about 175... We did add more entries, but also uploaded scans for them.
Tut mir Leid, ich habe dir die daten, vom developmentserver gegeben und die sind natürlich falsch. Hier sind die neuen Daten: gibtEsBeiBSB_noOCR.txt inBSB_noOcr.txt
Wir hätten gerne eine Liste aller Bücher aus der Collection "Geschichte des Tierwissens" die keinen Scan/OCR auf dem Server haben. Informationen und Format wie oben.
Give us a few days for this. It's not hard to implement, but I guess Alex and I will provide a small patch for clas-digital and I will integrate this into the patch. I guess maybe I can provide the list by Thursday morning. If you need it earlier, please let me know.
Thursday morning is ok!
We not just get the list through the catalogue?
sorry, this is not possible yet, see #255
Ok, then we will work with the two lists provided above, that should give you until next week before we "run dry" again.
As #255 has been closed - is it possible now to get the list?
Oh, there was a missunderstanding I think. I though it would just be possible to look in the catalogue, where you can immediately see all books without ocr. But if this is not, what you need, then I will create the list today and send it to you right away
Because the catalogue is exactly that list: https://www.clas-digital.uni-frankfurt.de:9991/catalogue/collections/RFWJC42V/
Yes, I know, but I need a list where I can also make notes and save it etc. (from the production server, please)
Diese Listen helfen den beiden Inhalts-Hiwis und mir bei der Datenbeschaffung. Wenn man zB etwas bei der BSB bestellt, kann man dort notieren, welche BSB-ID die Bestellung hat, um eine Woche später, wenn die BSB liefert, die Bestellung einer Zotero-ID zuordnen zu können; um zu notieren, wenn irgendwo Probleme auftreten, etc.
Die Listen, die wir vorher hatten (gibtEsBeiBSB_noOCR.txt) hatten genau das richtige Format. Wir brauchen jetzt nur im nächsten Schritt eine Liste der Bücher in der Sammlung ohne OCR, unabhängig von tags etc., weil wir die ersten Listen durchgearbeitet haben. Es geht also nicht um Features für uns, sondern Hilfestellungen: Klar kann ich die Liste in ein Dokument Copypasten, alle Zeilen mit OCR löschen etc. - aber ich dachte, das ginge auch anders/schneller.
Klar, du bekommst die Liste vermutlich heute Abend!
Functionality is now implemented.
Sorry for taking so long. Here we finally are.
Mainly books in the curated catalogue have an entry for:
Only half of these appear in the database in fully-scanned/searchable form. Many entries instead have tags applied, in the form of:
Ideally, this should be clarified—needs proper analysis by reading the JSON, verses a quick
grep
.