CopticScriptorium / cts

Coptic Scriptorium's website for reading digitized Coptic texts and CTS URN resolution
http://data.copticscriptorium.org
Apache License 2.0
2 stars 3 forks source link

Search tools need to ingest the document metadata from ANNIS and not the corpus metadata #54

Closed lukehollis closed 9 years ago

lukehollis commented 9 years ago

Per @ctschroeder email: "Those filters should filter by document metadata. The filters will be inaccurate otherwise."

lukehollis commented 9 years ago

Search is broken for a moment while I work on this.

lukehollis commented 9 years ago

Okay, search tools are using document-specific metadata instead of corpus metadata. This reopens https://github.com/CopticScriptorium/cts/issues/22 with the splittable flag because the number of texts make the administrative task of manually configuring the splittable items and associating them to the necessary Texts prohibitively time-consuming. As is, the search field values are left un-split. Let me know what you both would like to do for moving forward here.

lukehollis commented 9 years ago

One other small follow up issue here--since the texts change with each ingest but the corpora stayed the same, the search field ingest will need to be rerun after each text ingest.

ctschroeder commented 9 years ago

I don't understand why we can't automate this. It is really too hard on the computer? This doesn't really make sense to me; ANNIS for example searches far more texts for elements than we are asking for in this filter. (referring also to https://github.com/CopticScriptorium/cts/issues/22#issuecomment-84244433 which asks for a list of presets.) Alternatively: can we set it up to automatically ingest the terms from the document level API and associate them to the necessary text at each ingest? Rather than manually creating a list of presets? Have the presets created with the ingest? Still don't understand why first part is so difficult, but offering this as alternative. This is something Amir will need to look at when he is able. I can't make this call on my own.

lukehollis commented 9 years ago

Hey Carrie,

Good question! ANNIS is able to perform the searches with these filters because it already has all its data in a format it's able to recognize. The problem with ingesting the search fields is converting an the element like "name1, name2, name3" to distinct search field values for the search tools. When the ingest occurs, without a list of presets or a splittable flag from ANNIS, the app isn't able to understand how to split the element to "name1", "name2", and "name3". It has nothing to do with the actual search functionality--just with the ingest from the metadata from ANNIS.

We could definitely set up the search terms to refresh automatically after each text ingest.

ctschroeder commented 9 years ago

I see you have been working on this, thanks! I see that now the terms in the menu bar are all different than what they were before. I'm not really sure what criteria were used in deciding what to put in the menu bar. It would be helpful to consult with us first before changing these kinds of things, since we've already had a couple of conversations about the menu bar.

Apologies if this is a work in progress and this feedback is not helpful at this time. I had to go to the site to help a researcher with a urn for citation in a publication and noticed the changes.

1.We need "corpora" back on the menu bar. 2.Annotation now gives the entire field instead of the indiv names. Please see @amir-zeldes recent emails on this. He has given instructions on the splittable elements and not using a list of presets.

  1. Collection is probably not the most helpful item. Please hide or replace. Repository would be more helpful. Or manuscript.
  2. Coptic edition is also not helpful given our current medata structure. Please hide or replace.

Other useful tabs could be manuscript idno (e.g., options would be MONB.YA, MONB.ZC, etc.), repository, translation (only if there is a splittable comma-delimited field like for the annotation).

Many thanks!

lukehollis commented 9 years ago

Carrie, the changes in the search menu bar only reflect using the document-specific metadata from ANNIS instead of the corpora metadata. As @amir-zeldes asked, they're totally customizable by an administrator and can be ordered in any way to show the menu options you wish to show. Apologies if this was confusing. I was following through on this ticket by ingesting the document metadata for search fields. I removed the advanced search from the menu bar at your request.

Please give me a list of the fields that you would like to show in the menu bar, and I can start getting things set up there. These can be changed any time in the future.

I'm still working on the splittable field but haven't been able to implement the solution Amir proposed in the other ticket yet. Thank you for your patience.

lukehollis commented 9 years ago

Here's the ticket about ordering the search fields for reference: https://github.com/CopticScriptorium/cts/issues/3. Currently the top 4 show in the primary search menu and the remainder show in the advanced search menu that you requested be hidden. We can adjust this number (top 4) to whatever you wish.

ctschroeder commented 9 years ago

Ok, thanks very much for this clarification. That explains a lot. Yes: corpus, msName, annotation, translation, language come to mind as the most helpful. Good luck with the split fields.

lukehollis commented 9 years ago

Okay, terrific! I'll configure these search fields to appear in the search tools area. Thanks, and more updates soon--

lukehollis commented 9 years ago

Okay, the corpus, msName, annotation, translation, and language search fields are now listed in the search tools. Which search tools appear and their order can be changed by an administrator at any time in the future. search_tools

ctschroeder commented 9 years ago

not closing yet, because there was some issue with the metadata ingest, which we discussed today

lukehollis commented 9 years ago

Both document metadata and corpus metadata are now ingested from ANNIS, but per our discussions, only document metadata is used for the search tools.

ctschroeder commented 9 years ago

I tested the filters based on metadata. Getting some weird results, like two iterations of bible stuff here: http://coptic.aweb.io/filter/annotation=7641:Rebecca%20S.%20Krawiec Looks the filter is providing edited and unedited Sahidica gospel of Mark chapters. But Krawiec is an annotator only on the Sahidica edited chapters.

Also the corpus level urns are wrong. Something odd is going on with however you are getting those corpus level URNs. Edited Gospel of Mark should be urn:cts:copticLit:nt.mark.sahidica_ed not urn:cts:copticLit:nt.mark. Unedited Gospel of Mark should be urn:cts:copticLit:nt.mark.sahidica not urn:cts:copticLit:nt. I will make a new issue or addit to the correct one.

ctschroeder commented 9 years ago

I did some more testing. The filters based on metadata are totally wrong. For example, selecting Luckritz Marquis is pulling up tons of wrong documents. Check to be sure you are pulling the metadata from document level, not the corpus level. I think the filters are operating on the corpus level metadata.

ctschroeder commented 9 years ago

Something's still wrong. Luckritz Marquis still pulls up documents she didn't annotate.

lukehollis commented 9 years ago

I'm showing Luckritz Marquis as an annotator on 5 documents, but it looks like the metadata from ANNIS might be incorrect for 1_Corinthians_10 and 1_Corinthians_11. Here are the links to what the resolver's ingesting: http://corpling.uis.georgetown.edu/annis-service/annis/meta/doc/sahidica.nt/1_Corinthians_10 http://corpling.uis.georgetown.edu/annis-service/annis/meta/doc/sahidica.nt/1_Corinthians_11

Luckrtiz Marquis is listed in the annotation metadata for both of these.

ctschroeder commented 9 years ago

Thanks for chasing this down. I notice that the metadata for 1 cor 10 is actually the metadata for one of the AP. Scroll to the bottom of the doc page: http://data.copticscriptorium.org/texts/new_testament/1_corinthians_10/sahidica . For some reason the links to the ANNIS service aren't working for me.). The same is true for 1Cor 11. http://data.copticscriptorium.org/texts/new_testament/1_corinthians_11/sahidica

I don't know where in the process this is happening or why. @amir-zeldes, maybe you can take a look?

amir-zeldes commented 9 years ago

I’ve tracked down the error an corrected it (the folders are next to each other on the ANNIS server, looks like an accidental drag and drop mouse twitch I’m afraid – the folders in the github repo don’t have the error)

From: Caroline T. Schroeder [mailto:notifications@github.com] Sent: Friday, May 15, 2015 10:42 To: CopticScriptorium/cts Cc: Amir Zeldes Subject: Re: [cts] Search tools need to ingest the document metadata from ANNIS and not the corpus metadata (#54)

Thanks for chasing this down. I notice that the metadata for 1 cor 10 is actually the metadata for one of the AP. Scroll to the bottom of the doc page: http://data.copticscriptorium.org/texts/new_testament/1_corinthians_10/sahidica . For some reason the links to the ANNIS service aren't working for me.). The same is true for 1Cor 11. http://data.copticscriptorium.org/texts/new_testament/1_corinthians_11/sahidica

I don't know where in the process this is happening or why. @amir-zeldes https://github.com/amir-zeldes , maybe you can take a look?

— Reply to this email directly or view it on GitHub https://github.com/CopticScriptorium/cts/issues/54#issuecomment-102416757 .Image removed by sender.

ctschroeder commented 9 years ago

thanks!