ALIADA / aliada-tool

Aliada tool implementation
GNU General Public License v3.0
35 stars 14 forks source link

RDFizer: Named Entity Recognition #76

Closed agazzarini closed 9 years ago

agazzarini commented 9 years ago

As part of the final release we have to attach, somewhere in the conversion pipeline, a dedicated component for doing NLP analysis on some terms extracted from records. The NER results will be stored as RDF triples that will enrich the owning records.

agazzarini commented 9 years ago

I'm trying the Stanford NER library [1] which seems interesting. The only annoying thing is that it requries Java 8 so, after trying, if we decide to go for it, we could:

a) put this library on a separate tomcat which run with Java 8 and provide the NER service through HTTP b) upgrade our JVM and use the library directly within the RDFizer

agazzarini commented 9 years ago

@idoiamurua @scanbit what do you think about the previous comment? I completed a first trial of that NER and, differently from other similar software (e.g. stanbol, opennlp), it works simply and without a big overhead. The product comes with several knowledge base dbs that are able to detect nouns and entities also in our records (I tried something in spanish and in hungarian)

However, as I said, it requires Java 8. Do you have any specific trouble in your modules if we upgrade our JVM?

idoiamurua commented 9 years ago

Will we have any problems with Tomcat7 and Java 8?

agazzarini commented 9 years ago

No I don't think

On 02/24/2015 09:26 AM, Idoia wrote:

Will we have any problems with Tomcat7 and Java 8?

— Reply to this email directly or view it on GitHub https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-75716650.

idoiamurua commented 9 years ago

@eturienzo What about Struts library?

agazzarini commented 9 years ago

Theoretically there shouldn't be any problem at all. However, it would be great if we could try our modules with Java 8

In addition, the EOL of Java7 is scheduled for next April so it will have a short life.

Another alternative should be an additional tomcat providing those NER services through HTTP.

On 02/24/2015 09:30 AM, Idoia wrote:

@eturienzo https://github.com/eturienzo What about Struts library?

— Reply to this email directly or view it on GitHub https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-75717044.

idoiamurua commented 9 years ago

Shall we change the Java version in the Aliada POM to 8? To do so, Is it enough to change the following lines in the POM?

    <maven.compiler.source>1.7</maven.compiler.source>
    <maven.compiler.target>1.7</maven.compiler.target>
    <java.version>1.7</java.version>
agazzarini commented 9 years ago

Yes. That should be enough (assuming you have Java 8 installed)

On 02/24/2015 09:53 AM, Idoia wrote:

Shall we change the Java version in the Aliada POM to 8? To do so, Is it enough to change the following lines in the POM?

| 1.7</maven.compiler.source>

1.7 1.7 | — Reply to this email directly or view it on GitHub https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-75719644.
idoiamurua commented 9 years ago

@scanbit @eturienzo @xmolero Could you make a try of the UI (struts) with Java 8?

eturienzo commented 9 years ago

I am going to check it with th UI. When I finish, I will tell you

idoiamurua commented 9 years ago

OK. I will try my code too.

eturienzo commented 9 years ago

We have no problem with the change to JAVA8.

agazzarini commented 9 years ago

Ok, @idoiamurua, if I remember the interlink uses java standalone processes...if you have problem with Java8 you could always run them with java 7


Regardless the path we will follow, let's assume that on top of 1 record I detect 2 names and 1 place: where do we put this information? Do we have to use the EFRBROO ontology? Because I think this information should be kept apart, they aren't controlled and maybe they could contains mistakes

On 02/24/2015 10:45 AM, eturienzo wrote:

We have no problem with the change to JAVA8.

— Reply to this email directly or view it on GitHub https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-75726525.

idoiamurua commented 9 years ago

Yes. I do not think there is no problem with the Links Discovery Client application module. I have just built all ALIADA modules with Java 8 in my PC and seems to give no problems. I will change the Java version of the ALIADA POM in Github.

agazzarini commented 9 years ago

@cgareta @tpossemato could you please reply to this topic with a list of the most representative tags we should consider as NER input?

I need this lists separated by input format, so one for MARC, one for LIDO and one for DC

thx

On 02/24/2015 09:26 AM, Idoia wrote:

Will we have any problems with Tomcat7 and Java 8?

— Reply to this email directly or view it on GitHub https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-75716650.

agazzarini commented 9 years ago

@idoiamurua I'd like to know your opinion about where to store the named entities extracted with NER...do you know if any of the imported ontologies do have a suitable predicate for that?

agazzarini commented 9 years ago

The knowledge model I'm using is able to detect

Ideally, we should transform those entities in triples maintaining their type but I'm not sure how to proceed. The ontology doesn't seem to provide suitable predicates for such scenario

idoiamurua commented 9 years ago

I have sent an e-mail to Marta, she is not in the office today, about this issue. I proposed to her the following options:

I hope tomorrow I have an answer.

agazzarini commented 9 years ago

Great! Many thanks. How can we distinguish between a Person and Place?

idoiamurua commented 9 years ago

I have just been talking with Marta, because she confectionated Aliada ontology and we have been discussing about how can we add these new subjects. She will look it more deeply, but now she is busy with other things. She hopes to answer by next Monday. We have seen the next two other possible properties existing in Aliada ontology:

agazzarini commented 9 years ago

Absolutely no rush at all. I have to prepare things in the codebase so the triple production is the last thing.ù

Many thanks for your help

On 02/27/2015 10:48 AM, Idoia wrote:

I have just been talking with Marta, because she confectionated Aliada ontology and we have been discussing about how can we add these new subjects. She will look it more deeply, but now she is busy with other things. She hopes to answer by next Monday. We have seen the next two other possible properties existing in Aliada ontology:

  • foaf:topic * domain foaf:Document . Problem: foaf_Document not aligned yet with ECRM/FRBRoo ontology and our used classes frbr:F3_Manifestation_Product_Type or ecrm:E19_Physical_Object. range Thing o ecrm:P67_refers_to * domain ecrm:E89_Propositional_Object . Problem: our instances are of type frbr:F3_Manifestation_Product_Type or ecrm:E19_Physical_Object, not subclasses of ecrm:E89_Propositional_Object. \ range ecrm:E1_CRM_Entity

— Reply to this email directly or view it on GitHub https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76365242.

agazzarini commented 9 years ago

About the ecrm:P67_refers_to I think it's not a problem is our entity don't belong to that doamin: it will when we say

:xyz ecrm:P67_refers_to E1_CRM_Entity

in this way, regardless what is xyz, it will become also an E89_Propositional_Object

What about the object? I mean imagine that I detected the place "Florence" ...how does the E1_CRM_Entity hold this literal value? How can we typize that entity so we can distinguish that from another (person) "Alessandro Manzoni"?

idoiamurua commented 9 years ago

I have just made a trial creating the following two objects:

   <owl:NamedIndividual rdf:about="&efrbroo;NLPTrial_Object">
       <rdf:type rdf:resource="&ecrm;E19_Physical_Object"/>
       <rdf:type rdf:resource="&ecrm;E89_Propositional_Object"/>
   </owl:NamedIndividual>
  <owl:NamedIndividual rdf:about="&efrbroo;NLPTrial_Manif">
       <rdf:type rdf:resource="&ecrm;E89_Propositional_Object"/>
       <rdf:type rdf:resource="&efrbroo;F3_Manifestation_Product_Type"/>
   </owl:NamedIndividual>

and then I have passed the OWLMicroREasoner, and it has given me the following errors:

   Conflicts
        - Error ("conflict"): "Individual a member of disjoint classes"
   Culprit = http://erlangen-crm.org/efrbroo/NLPTrial_Object
   Implicated node: http://erlangen-crm.org/current/E18_Physical_Thing
   Implicated node: http://erlangen-crm.org/current/E28_Conceptual_Object

The two classes are disjoint, so it would not be possible. There would no problem with the F3_Manifestation_Product_Type.

agazzarini commented 9 years ago

This is another interesting question: what is the target entity? Work? Expression? Manifestation?

I think @cgareta or @tpossemato could help us

On 03/02/2015 09:33 AM, Idoia wrote:

I have just made a trial creating the following two objects:

|

    <rdf:type rdf:resource="&ecrm;E89_Propositional_Object"/>
/owl:NamedIndividual
/owl:NamedIndividual | and then I have passed the OWLMicroREasoner, and it has given me the following errors: | Conflicts - Error ("conflict"): "Individual a member of disjoint classes" Culprit = http://erlangen-crm.org/efrbroo/NLPTrial_Object Implicated node: http://erlangen-crm.org/current/E18_Physical_Thing Implicated node: http://erlangen-crm.org/current/E28_Conceptual_Object | The two classes are disjoint, so it would not be possible. There would no problem with the F3_Manifestation_Product_Type. — Reply to this email directly or view it on GitHub https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522.
idoiamurua commented 9 years ago

To distinguish between a Person, an Organization or a Place, we should create instances of type ecrm:E21_Person, ecrm:E39_Actor or a E53_Place respectivelly. Then add the corresponding object properties: ecrm:P131_is_identified_by, ecrm:P131_is_identified_by and ecrm:P87_is_identified_by, respectivelly, where the literals the NLP processor has found would be inserted.

cgareta commented 9 years ago

Good question: work is the main entity, but catalogues are plenty of manifestations without tags at work level… Manifestation+item is the most common situation

De: Andrea Gazzarini [mailto:notifications@github.com] Enviado el: lunes, 02 de marzo de 2015 10:11 Para: ALIADA/aliada-tool CC: cgareta Asunto: Re: [aliada-tool] RDFizer: NLP usage (#76)

This is another interesting question: what is the target entity? Work? Expression? Manifestation?

I think @cgareta or @tpossemato could help us

On 03/02/2015 09:33 AM, Idoia wrote:

I have just made a trial creating the following two objects:

|

/owl:NamedIndividual

/owl:NamedIndividual | and then I have passed the OWLMicroREasoner, and it has given me the following errors: | Conflicts - Error ("conflict"): "Individual a member of disjoint classes" Culprit = http://erlangen-crm.org/efrbroo/NLPTrial_Object http://erlangen-crm.org/efrbroo/NLPTrial_Object Implicated node: http://erlangen-crm.org/current/E18_Physical_Thing http://erlangen-crm.org/current/E18_Physical_Thing Implicated node: http://erlangen-crm.org/current/E28_Conceptual_Object http://erlangen-crm.org/current/E28_Conceptual_Object | The two classes are disjoint, so it would not be possible. There would no problem with the F3_Manifestation_Product_Type. — Reply to this email directly or view it on GitHub < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522>.

— Reply to this email directly or view it on GitHub https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76678772 . https://github.com/notifications/beacon/AH33Nu96VNLx1GJ0IiIlMbaoX8DyRo74ks5nxCCugaJpZM4DT3Wd.gif

agazzarini commented 9 years ago

Hi Cristina, many thanks for your prompt response.

I think it would be great to provide a simple map between tag/subfield and the corresponding entities. For instance,

tag 500$a = Manifestation

That means: the outcoming entities (from a NER service execution that received as input the text content of 500$a) must be assigned to Manifestation level

Best Andrea

2015-03-02 12:40 GMT+01:00 cgareta notifications@github.com:

Good question: work is the main entity, but catalogues are plenty of manifestations without tags at work level… Manifestation+item is the most common situation

De: Andrea Gazzarini [mailto:notifications@github.com] Enviado el: lunes, 02 de marzo de 2015 10:11 Para: ALIADA/aliada-tool CC: cgareta Asunto: Re: [aliada-tool] RDFizer: NLP usage (#76)

This is another interesting question: what is the target entity? Work? Expression? Manifestation?

I think @cgareta or @tpossemato could help us

On 03/02/2015 09:33 AM, Idoia wrote:

I have just made a trial creating the following two objects:

|

/owl:NamedIndividual

/owl:NamedIndividual | and then I have passed the OWLMicroREasoner, and it has given me the following errors: | Conflicts - Error ("conflict"): "Individual a member of disjoint classes" Culprit = http://erlangen-crm.org/efrbroo/NLPTrial_Object http://erlangen-crm.org/efrbroo/NLPTrial_Object Implicated node: http://erlangen-crm.org/current/E18_Physical_Thing http://erlangen-crm.org/current/E18_Physical_Thing Implicated node: http://erlangen-crm.org/current/E28_Conceptual_Object http://erlangen-crm.org/current/E28_Conceptual_Object | The two classes are disjoint, so it would not be possible. There would no problem with the F3_Manifestation_Product_Type. — Reply to this email directly or view it on GitHub < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522>.

— Reply to this email directly or view it on GitHub < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76678772> . < https://github.com/notifications/beacon/AH33Nu96VNLx1GJ0IiIlMbaoX8DyRo74ks5nxCCugaJpZM4DT3Wd.gif>

— Reply to this email directly or view it on GitHub https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76698067.

tpossemato commented 9 years ago

Hi Andrea, attached a document with a mapping of most used Marc21 tags for Work, Expression and Person/Family/Corporate body: all tags not included in the sheets, if coming from Marc21 bibliographic, has to be considered useful to identify manifestations.

So, generally speaking:

Let me know if it’s useful to understand or if you prefer a new mapping schema.

Bye

Tiziana

Da: cgareta [mailto:notifications@github.com] Inviato: lunedì 2 marzo 2015 12:41 A: ALIADA/aliada-tool Cc: tpossemato Oggetto: Re: [aliada-tool] RDFizer: NLP usage (#76)

Good question: work is the main entity, but catalogues are plenty of manifestations without tags at work level… Manifestation+item is the most common situation

De: Andrea Gazzarini [mailto:notifications@github.com] Enviado el: lunes, 02 de marzo de 2015 10:11 Para: ALIADA/aliada-tool CC: cgareta Asunto: Re: [aliada-tool] RDFizer: NLP usage (#76)

This is another interesting question: what is the target entity? Work? Expression? Manifestation?

I think @cgareta or @tpossemato could help us

On 03/02/2015 09:33 AM, Idoia wrote:

I have just made a trial creating the following two objects:

|

/owl:NamedIndividual

/owl:NamedIndividual | and then I have passed the OWLMicroREasoner, and it has given me the following errors: | Conflicts - Error ("conflict"): "Individual a member of disjoint classes" Culprit = http://erlangen-crm.org/efrbroo/NLPTrial_Object http://erlangen-crm.org/efrbroo/NLPTrial_Object Implicated node: http://erlangen-crm.org/current/E18_Physical_Thing http://erlangen-crm.org/current/E18_Physical_Thing Implicated node: http://erlangen-crm.org/current/E28_Conceptual_Object http://erlangen-crm.org/current/E28_Conceptual_Object | The two classes are disjoint, so it would not be possible. There would no problem with the F3_Manifestation_Product_Type. — Reply to this email directly or view it on GitHub < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522>.

— Reply to this email directly or view it on GitHub https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76678772 . https://github.com/notifications/beacon/AH33Nu96VNLx1GJ0IiIlMbaoX8DyRo74ks5nxCCugaJpZM4DT3Wd.gif

— Reply to this email directly or view it on GitHub https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76698067 . https://github.com/notifications/beacon/AH3_ouWWtZ-FdB1URtFL9vfPq2Vh5ecRks5nxEOygaJpZM4DT3Wd.gif

agazzarini commented 9 years ago

Tiziana, GitHub doesn't allow attachments in posts. Please send me the file and I will submit to our repository

2015-03-02 12:51 GMT+01:00 tpossemato notifications@github.com:

Hi Andrea, attached a document with a mapping of most used Marc21 tags for Work, Expression and Person/Family/Corporate body: all tags not included in the sheets, if coming from Marc21 bibliographic, has to be considered useful to identify manifestations.

So, generally speaking:

  • for Work, Expression and Person/Family/Corporate body: you need to consider at least authority data plus bibliographic data reported in the sheets;
    • for manifestations: you need to consider above all bibliographic tags, with exception of the tags reported in the sheets for other entities.

Let me know if it’s useful to understand or if you prefer a new mapping schema.

Bye

Tiziana

Da: cgareta [mailto:notifications@github.com] Inviato: lunedì 2 marzo 2015 12:41 A: ALIADA/aliada-tool Cc: tpossemato Oggetto: Re: [aliada-tool] RDFizer: NLP usage (#76)

Good question: work is the main entity, but catalogues are plenty of manifestations without tags at work level… Manifestation+item is the most common situation

De: Andrea Gazzarini [mailto:notifications@github.com] Enviado el: lunes, 02 de marzo de 2015 10:11 Para: ALIADA/aliada-tool CC: cgareta Asunto: Re: [aliada-tool] RDFizer: NLP usage (#76)

This is another interesting question: what is the target entity? Work? Expression? Manifestation?

I think @cgareta or @tpossemato could help us

On 03/02/2015 09:33 AM, Idoia wrote:

I have just made a trial creating the following two objects:

|

/owl:NamedIndividual

/owl:NamedIndividual | and then I have passed the OWLMicroREasoner, and it has given me the following errors: | Conflicts - Error ("conflict"): "Individual a member of disjoint classes" Culprit = http://erlangen-crm.org/efrbroo/NLPTrial_Object http://erlangen-crm.org/efrbroo/NLPTrial_Object Implicated node: http://erlangen-crm.org/current/E18_Physical_Thing http://erlangen-crm.org/current/E18_Physical_Thing Implicated node: http://erlangen-crm.org/current/E28_Conceptual_Object http://erlangen-crm.org/current/E28_Conceptual_Object | The two classes are disjoint, so it would not be possible. There would no problem with the F3_Manifestation_Product_Type. — Reply to this email directly or view it on GitHub < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522>.

— Reply to this email directly or view it on GitHub < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76678772> . < https://github.com/notifications/beacon/AH33Nu96VNLx1GJ0IiIlMbaoX8DyRo74ks5nxCCugaJpZM4DT3Wd.gif>

— Reply to this email directly or view it on GitHub < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76698067> . < https://github.com/notifications/beacon/AH3_ouWWtZ-FdB1URtFL9vfPq2Vh5ecRks5nxEOygaJpZM4DT3Wd.gif>

— Reply to this email directly or view it on GitHub https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76699299.

cgareta commented 9 years ago

That’s is in the mapping I sent. Do you need something more simple? A 2-column table?

Best

Cristina

De: Andrea Gazzarini [mailto:notifications@github.com] Enviado el: lunes, 02 de marzo de 2015 12:45 Para: ALIADA/aliada-tool CC: cgareta Asunto: Re: [aliada-tool] RDFizer: NLP usage (#76)

Hi Cristina, many thanks for your prompt response.

I think it would be great to provide a simple map between tag/subfield and the corresponding entities. For instance,

tag 500$a = Manifestation

That means: the outcoming entities (from a NER service execution that received as input the text content of 500$a) must be assigned to Manifestation level

Best Andrea

2015-03-02 12:40 GMT+01:00 cgareta < mailto:notifications@github.com notifications@github.com>:

Good question: work is the main entity, but catalogues are plenty of manifestations without tags at work level… Manifestation+item is the most common situation

De: Andrea Gazzarini [ mailto:notifications@github.com mailto:notifications@github.com] Enviado el: lunes, 02 de marzo de 2015 10:11 Para: ALIADA/aliada-tool CC: cgareta Asunto: Re: [aliada-tool] RDFizer: NLP usage (#76)

This is another interesting question: what is the target entity? Work? Expression? Manifestation?

I think @cgareta or @tpossemato could help us

On 03/02/2015 09:33 AM, Idoia wrote:

I have just made a trial creating the following two objects:

|

/owl:NamedIndividual

/owl:NamedIndividual | and then I have passed the OWLMicroREasoner, and it has given me the following errors: | Conflicts - Error ("conflict"): "Individual a member of disjoint classes" Culprit = < http://erlangen-crm.org/efrbroo/NLPTrial_Object http://erlangen-crm.org/efrbroo/NLPTrial_Object> http://erlangen-crm.org/efrbroo/NLPTrial_Object http://erlangen-crm.org/efrbroo/NLPTrial_Object Implicated node: < http://erlangen-crm.org/current/E18_Physical_Thing http://erlangen-crm.org/current/E18_Physical_Thing> http://erlangen-crm.org/current/E18_Physical_Thing http://erlangen-crm.org/current/E18_Physical_Thing Implicated node: < http://erlangen-crm.org/current/E28_Conceptual_Object http://erlangen-crm.org/current/E28_Conceptual_Object> http://erlangen-crm.org/current/E28_Conceptual_Object http://erlangen-crm.org/current/E28_Conceptual_Object | The two classes are disjoint, so it would not be possible. There would no problem with the F3_Manifestation_Product_Type. — Reply to this email directly or view it on GitHub < < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522> https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522>.

— Reply to this email directly or view it on GitHub < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76678772 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76678772> . < https://github.com/notifications/beacon/AH33Nu96VNLx1GJ0IiIlMbaoX8DyRo74ks5nxCCugaJpZM4DT3Wd.gif https://github.com/notifications/beacon/AH33Nu96VNLx1GJ0IiIlMbaoX8DyRo74ks5nxCCugaJpZM4DT3Wd.gif>

— Reply to this email directly or view it on GitHub < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76698067 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76698067>.

— Reply to this email directly or view it on GitHub https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76698567 . https://github.com/notifications/beacon/AH33NrhTAdl8w0XYCwsMWsZHZ0IOErW6ks5nxES6gaJpZM4DT3Wd.gif

cgareta commented 9 years ago

Please, consider also the entities related to subjects:

These entities are related to the work

Cristina

De: tpossemato [mailto:notifications@github.com] Enviado el: lunes, 02 de marzo de 2015 12:52 Para: ALIADA/aliada-tool CC: cgareta Asunto: Re: [aliada-tool] RDFizer: NLP usage (#76)

Hi Andrea, attached a document with a mapping of most used Marc21 tags for Work, Expression and Person/Family/Corporate body: all tags not included in the sheets, if coming from Marc21 bibliographic, has to be considered useful to identify manifestations.

So, generally speaking:

Let me know if it’s useful to understand or if you prefer a new mapping schema.

Bye

Tiziana

Da: cgareta [ mailto:notifications@github.com mailto:notifications@github.com] Inviato: lunedì 2 marzo 2015 12:41 A: ALIADA/aliada-tool Cc: tpossemato Oggetto: Re: [aliada-tool] RDFizer: NLP usage (#76)

Good question: work is the main entity, but catalogues are plenty of manifestations without tags at work level… Manifestation+item is the most common situation

De: Andrea Gazzarini [ mailto:notifications@github.com mailto:notifications@github.com] Enviado el: lunes, 02 de marzo de 2015 10:11 Para: ALIADA/aliada-tool CC: cgareta Asunto: Re: [aliada-tool] RDFizer: NLP usage (#76)

This is another interesting question: what is the target entity? Work? Expression? Manifestation?

I think @cgareta or @tpossemato could help us

On 03/02/2015 09:33 AM, Idoia wrote:

I have just made a trial creating the following two objects:

|

/owl:NamedIndividual

/owl:NamedIndividual | and then I have passed the OWLMicroREasoner, and it has given me the following errors: | Conflicts - Error ("conflict"): "Individual a member of disjoint classes" Culprit = < http://erlangen-crm.org/efrbroo/NLPTrial_Object http://erlangen-crm.org/efrbroo/NLPTrial_Object> http://erlangen-crm.org/efrbroo/NLPTrial_Object http://erlangen-crm.org/efrbroo/NLPTrial_Object Implicated node: < http://erlangen-crm.org/current/E18_Physical_Thing http://erlangen-crm.org/current/E18_Physical_Thing> http://erlangen-crm.org/current/E18_Physical_Thing http://erlangen-crm.org/current/E18_Physical_Thing Implicated node: < http://erlangen-crm.org/current/E28_Conceptual_Object http://erlangen-crm.org/current/E28_Conceptual_Object> http://erlangen-crm.org/current/E28_Conceptual_Object http://erlangen-crm.org/current/E28_Conceptual_Object | The two classes are disjoint, so it would not be possible. There would no problem with the F3_Manifestation_Product_Type. — Reply to this email directly or view it on GitHub < < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522> https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522>.

— Reply to this email directly or view it on GitHub < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76678772 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76678772> . < https://github.com/notifications/beacon/AH33Nu96VNLx1GJ0IiIlMbaoX8DyRo74ks5nxCCugaJpZM4DT3Wd.gif https://github.com/notifications/beacon/AH33Nu96VNLx1GJ0IiIlMbaoX8DyRo74ks5nxCCugaJpZM4DT3Wd.gif>

— Reply to this email directly or view it on GitHub < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76698067 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76698067> . https://github.com/notifications/beacon/AH3_ouWWtZ-FdB1URtFL9vfPq2Vh5ecRks5nxEOygaJpZM4DT3Wd.gif

— Reply to this email directly or view it on GitHub https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76699299 . https://github.com/notifications/beacon/AH33Np4ykWm8uiER_kLII4UVHJ-ISotXks5nxEZQgaJpZM4DT3Wd.gif

agazzarini commented 9 years ago

Ok @cgareta, yes I see.

Now, before seeing the attachment of Tiziana, in your file there are listed a lot of tags, but as you know, we don't have to process all tags with NLP; for that kind of recognition we need only those tag that do have a reasonable amount of text, like notes.

Could you please list them?

Best, Andrea

2015-03-02 13:30 GMT+01:00 cgareta notifications@github.com:

Please, consider also the entities related to subjects:

  • Concept
  • Place
  • Event
  • Object

These entities are related to the work

Cristina

De: tpossemato [mailto:notifications@github.com] Enviado el: lunes, 02 de marzo de 2015 12:52 Para: ALIADA/aliada-tool CC: cgareta Asunto: Re: [aliada-tool] RDFizer: NLP usage (#76)

Hi Andrea, attached a document with a mapping of most used Marc21 tags for Work, Expression and Person/Family/Corporate body: all tags not included in the sheets, if coming from Marc21 bibliographic, has to be considered useful to identify manifestations.

So, generally speaking:

  • for Work, Expression and Person/Family/Corporate body: you need to consider at least authority data plus bibliographic data reported in the sheets;
    • for manifestations: you need to consider above all bibliographic tags, with exception of the tags reported in the sheets for other entities.

Let me know if it’s useful to understand or if you prefer a new mapping schema.

Bye

Tiziana

Da: cgareta [ mailto:notifications@github.com mailto: notifications@github.com] Inviato: lunedì 2 marzo 2015 12:41 A: ALIADA/aliada-tool Cc: tpossemato Oggetto: Re: [aliada-tool] RDFizer: NLP usage (#76)

Good question: work is the main entity, but catalogues are plenty of manifestations without tags at work level… Manifestation+item is the most common situation

De: Andrea Gazzarini [ mailto:notifications@github.com mailto: notifications@github.com] Enviado el: lunes, 02 de marzo de 2015 10:11 Para: ALIADA/aliada-tool CC: cgareta Asunto: Re: [aliada-tool] RDFizer: NLP usage (#76)

This is another interesting question: what is the target entity? Work? Expression? Manifestation?

I think @cgareta or @tpossemato could help us

On 03/02/2015 09:33 AM, Idoia wrote:

I have just made a trial creating the following two objects:

|

/owl:NamedIndividual

/owl:NamedIndividual | and then I have passed the OWLMicroREasoner, and it has given me the following errors: | Conflicts - Error ("conflict"): "Individual a member of disjoint classes" Culprit = < http://erlangen-crm.org/efrbroo/NLPTrial_Object http://erlangen-crm.org/efrbroo/NLPTrial_Object> < http://erlangen-crm.org/efrbroo/NLPTrial_Object> http://erlangen-crm.org/efrbroo/NLPTrial_Object Implicated node: < http://erlangen-crm.org/current/E18_Physical_Thing http://erlangen-crm.org/current/E18_Physical_Thing> < http://erlangen-crm.org/current/E18_Physical_Thing> http://erlangen-crm.org/current/E18_Physical_Thing Implicated node: < < http://erlangen-crm.org/current/E28_Conceptual_Object> http://erlangen-crm.org/current/E28_Conceptual_Object> < http://erlangen-crm.org/current/E28_Conceptual_Object> http://erlangen-crm.org/current/E28_Conceptual_Object | The two classes are disjoint, so it would not be possible. There would no problem with the F3_Manifestation_Product_Type. — Reply to this email directly or view it on GitHub < < < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522> https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522> < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522> https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522>.

— Reply to this email directly or view it on GitHub < < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76678772> https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76678772> . < < https://github.com/notifications/beacon/AH33Nu96VNLx1GJ0IiIlMbaoX8DyRo74ks5nxCCugaJpZM4DT3Wd.gif>

https://github.com/notifications/beacon/AH33Nu96VNLx1GJ0IiIlMbaoX8DyRo74ks5nxCCugaJpZM4DT3Wd.gif>

— Reply to this email directly or view it on GitHub < < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76698067> https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76698067> . < https://github.com/notifications/beacon/AH3_ouWWtZ-FdB1URtFL9vfPq2Vh5ecRks5nxEOygaJpZM4DT3Wd.gif>

— Reply to this email directly or view it on GitHub < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76699299> . < https://github.com/notifications/beacon/AH33Np4ykWm8uiER_kLII4UVHJ-ISotXks5nxEZQgaJpZM4DT3Wd.gif>

— Reply to this email directly or view it on GitHub https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76703899.

cgareta commented 9 years ago

Are you going to process notes? Main notes are:

NOTE TYPE

MARC TAG

SUBFIELD

FRBR ENTITY

RELATIONSHIP

500 - General Note (R) Full | Concise

501 - With Note (R) Full | Concise

502 - Dissertation Note (R) Full | Concise

504 - Bibliography, etc. Note (R) Full | Concise

504

a

Work

Form of work

505 - Formatted Contents Note (R) Full | Concise

505

a

Expression

Summarization of content

506 - Restrictions on Access Note (R) Full | Concise

507 - Scale Note for Graphic Material (NR) Full | Concise

508 - Creation/Production Credits Note (R) Full | Concise

510 - Citation/References Note (R) Full | Concise

511 - Participant or Performer Note (R) Full | Concise

511

a

Manifestation

Statement of responsibility

513 - Type of Report and Period Covered Note (R) Full | Concise

514 - Data Quality Note (NR) Full | Concise

515 - Numbering Peculiarities Note (R) Full | Concise

516 - Type of Computer File or Data Note (R) Full | Concise

518 - Date/Time and Place of an Event Note (R) Full | Concise

520 - Summary, etc. (R) Full | Concise

520

a

Expression

Summarization of content

521 - Target Audience Note (R) Full | Concise

521

a

Work

Intended audience

522 - Geographic Coverage Note (R) Full | Concise

522

a

Place

Term for place

524 - Preferred Citation of Described Materials Note (R) Full | Concise

525 - Supplement Note (R) Full | Concise

526 - Study Program Information Note (R) Full | Concise

530 - Additional Physical Form available Note (R) Full | Concise

530

a

Manifestation

Alternate

533 - Reproduction Note (R) Full | Concise

534 - Original Version Note (R) Full | Concise

535 - Location of Originals/Duplicates Note (R) Full | Concise

536 - Funding Information Note (R) Full | Concise

538 - System Details Note (R) Full | Concise

540 - Terms Governing Use and Reproduction Note (R) Full | Concise

540

a

Manifestation

Access restrictions

541 - Immediate Source of Acquisition Note (R) Full | Concise

541

a

Item

Provenance

542 - Information Relating to Copyright Status (R) Full | Concise

544 - Location of Other Archival Materials Note (R) Full | Concise

545 - Biographical or Historical Data (R) Full | Concise

545

a

Person/Corp

Biography/history*

546 - Language Note (R) Full | Concise

546

a

Expression

Language of expression

547 - Former Title Complexity Note (R) Full | Concise

550 - Issuing Body Note (R) Full | Concise

552 - Entity and Attribute Information Note (R) Full | Concise

555 - Cumulative Index/Finding Aids Note (R) Full | Concise

556 - Information About Documentation Note (R) Full | Concise

561 - Ownership and Custodial History (R) Full | Concise

562 - Copy and Version Identification Note (R) Full | Concise

563 - Binding Information (R) Full | Concise

565 - Case File Characteristics Note (R) Full | Concise

567 - Methodology Note (R) Full | Concise

580 - Linking Entry Complexity Note (R) Full | Concise

581 - Publications About Described Materials Note (R) Full | Concise

583 - Action Note (R) Full | Concise

584 - Accumulation and Frequency of Use Note (R) Full | Concise

585 - Exhibitions Note (R) Full | Concise

586 - Awards Note (R) Full | Concise

588 - Source of Description Note (R) Full | Concise

59X - Local Notes Full | Concise

De: Andrea Gazzarini [mailto:notifications@github.com] Enviado el: lunes, 02 de marzo de 2015 13:37 Para: ALIADA/aliada-tool CC: cgareta Asunto: Re: [aliada-tool] RDFizer: NLP usage (#76)

Ok @cgareta, yes I see.

Now, before seeing the attachment of Tiziana, in your file there are listed a lot of tags, but as you know, we don't have to process all tags with NLP; for that kind of recognition we need only those tag that do have a reasonable amount of text, like notes.

Could you please list them?

Best, Andrea

2015-03-02 13:30 GMT+01:00 cgareta < mailto:notifications@github.com notifications@github.com>:

Please, consider also the entities related to subjects:

  • Concept
  • Place
  • Event
  • Object

These entities are related to the work

Cristina

De: tpossemato [ mailto:notifications@github.com mailto:notifications@github.com] Enviado el: lunes, 02 de marzo de 2015 12:52 Para: ALIADA/aliada-tool CC: cgareta Asunto: Re: [aliada-tool] RDFizer: NLP usage (#76)

Hi Andrea, attached a document with a mapping of most used Marc21 tags for Work, Expression and Person/Family/Corporate body: all tags not included in the sheets, if coming from Marc21 bibliographic, has to be considered useful to identify manifestations.

So, generally speaking:

  • for Work, Expression and Person/Family/Corporate body: you need to consider at least authority data plus bibliographic data reported in the sheets;
    • for manifestations: you need to consider above all bibliographic tags, with exception of the tags reported in the sheets for other entities.

Let me know if it’s useful to understand or if you prefer a new mapping schema.

Bye

Tiziana

Da: cgareta [ < mailto:notifications@github.com mailto:notifications@github.com> mailto: mailto:notifications@github.com notifications@github.com] Inviato: lunedì 2 marzo 2015 12:41 A: ALIADA/aliada-tool Cc: tpossemato Oggetto: Re: [aliada-tool] RDFizer: NLP usage (#76)

Good question: work is the main entity, but catalogues are plenty of manifestations without tags at work level… Manifestation+item is the most common situation

De: Andrea Gazzarini [ < mailto:notifications@github.com mailto:notifications@github.com> mailto: mailto:notifications@github.com notifications@github.com] Enviado el: lunes, 02 de marzo de 2015 10:11 Para: ALIADA/aliada-tool CC: cgareta Asunto: Re: [aliada-tool] RDFizer: NLP usage (#76)

This is another interesting question: what is the target entity? Work? Expression? Manifestation?

I think @cgareta or @tpossemato could help us

On 03/02/2015 09:33 AM, Idoia wrote:

I have just made a trial creating the following two objects:

|

/owl:NamedIndividual

/owl:NamedIndividual | and then I have passed the OWLMicroREasoner, and it has given me the following errors: | Conflicts - Error ("conflict"): "Individual a member of disjoint classes" Culprit = < < http://erlangen-crm.org/efrbroo/NLPTrial_Object http://erlangen-crm.org/efrbroo/NLPTrial_Object> http://erlangen-crm.org/efrbroo/NLPTrial_Object http://erlangen-crm.org/efrbroo/NLPTrial_Object> < http://erlangen-crm.org/efrbroo/NLPTrial_Object http://erlangen-crm.org/efrbroo/NLPTrial_Object> http://erlangen-crm.org/efrbroo/NLPTrial_Object http://erlangen-crm.org/efrbroo/NLPTrial_Object Implicated node: < < http://erlangen-crm.org/current/E18_Physical_Thing http://erlangen-crm.org/current/E18_Physical_Thing> http://erlangen-crm.org/current/E18_Physical_Thing http://erlangen-crm.org/current/E18_Physical_Thing> < http://erlangen-crm.org/current/E18_Physical_Thing http://erlangen-crm.org/current/E18_Physical_Thing> http://erlangen-crm.org/current/E18_Physical_Thing http://erlangen-crm.org/current/E18_Physical_Thing Implicated node: < < http://erlangen-crm.org/current/E28_Conceptual_Object http://erlangen-crm.org/current/E28_Conceptual_Object> http://erlangen-crm.org/current/E28_Conceptual_Object http://erlangen-crm.org/current/E28_Conceptual_Object> < http://erlangen-crm.org/current/E28_Conceptual_Object http://erlangen-crm.org/current/E28_Conceptual_Object> http://erlangen-crm.org/current/E28_Conceptual_Object http://erlangen-crm.org/current/E28_Conceptual_Object | The two classes are disjoint, so it would not be possible. There would no problem with the F3_Manifestation_Product_Type. — Reply to this email directly or view it on GitHub < < < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522> https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522> < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522> https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522>.

— Reply to this email directly or view it on GitHub < < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76678772 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76678772> https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76678772 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76678772> . < < https://github.com/notifications/beacon/AH33Nu96VNLx1GJ0IiIlMbaoX8DyRo74ks5nxCCugaJpZM4DT3Wd.gif https://github.com/notifications/beacon/AH33Nu96VNLx1GJ0IiIlMbaoX8DyRo74ks5nxCCugaJpZM4DT3Wd.gif>

https://github.com/notifications/beacon/AH33Nu96VNLx1GJ0IiIlMbaoX8DyRo74ks5nxCCugaJpZM4DT3Wd.gif https://github.com/notifications/beacon/AH33Nu96VNLx1GJ0IiIlMbaoX8DyRo74ks5nxCCugaJpZM4DT3Wd.gif>

— Reply to this email directly or view it on GitHub < < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76698067 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76698067> https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76698067 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76698067> . < https://github.com/notifications/beacon/AH3_ouWWtZ-FdB1URtFL9vfPq2Vh5ecRks5nxEOygaJpZM4DT3Wd.gif https://github.com/notifications/beacon/AH3_ouWWtZ-FdB1URtFL9vfPq2Vh5ecRks5nxEOygaJpZM4DT3Wd.gif>

— Reply to this email directly or view it on GitHub < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76699299 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76699299> . < https://github.com/notifications/beacon/AH33Np4ykWm8uiER_kLII4UVHJ-ISotXks5nxEZQgaJpZM4DT3Wd.gif https://github.com/notifications/beacon/AH33Np4ykWm8uiER_kLII4UVHJ-ISotXks5nxEZQgaJpZM4DT3Wd.gif>

— Reply to this email directly or view it on GitHub < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76703899 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76703899>.

— Reply to this email directly or view it on GitHub https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76704680 . https://github.com/notifications/beacon/AH33Npf40YINpmFruM9LKa43El9Y_defks5nxFDjgaJpZM4DT3Wd.gif

agazzarini commented 9 years ago

Sorry we need another dimension: the source format, because tags are valid for marc but we need that also for lido and dc, too On 2 Mar 2015 13:36, "Andrea Gazzarini" a.gazzarini@gmail.com wrote:

Ok @cgareta, yes I see.

Now, before seeing the attachment of Tiziana, in your file there are listed a lot of tags, but as you know, we don't have to process all tags with NLP; for that kind of recognition we need only those tag that do have a reasonable amount of text, like notes.

Could you please list them?

Best, Andrea

2015-03-02 13:30 GMT+01:00 cgareta notifications@github.com:

Please, consider also the entities related to subjects:

  • Concept
  • Place
  • Event
  • Object

These entities are related to the work

Cristina

De: tpossemato [mailto:notifications@github.com] Enviado el: lunes, 02 de marzo de 2015 12:52 Para: ALIADA/aliada-tool CC: cgareta Asunto: Re: [aliada-tool] RDFizer: NLP usage (#76)

Hi Andrea, attached a document with a mapping of most used Marc21 tags for Work, Expression and Person/Family/Corporate body: all tags not included in the sheets, if coming from Marc21 bibliographic, has to be considered useful to identify manifestations.

So, generally speaking:

  • for Work, Expression and Person/Family/Corporate body: you need to consider at least authority data plus bibliographic data reported in the sheets;
    • for manifestations: you need to consider above all bibliographic tags, with exception of the tags reported in the sheets for other entities.

Let me know if it’s useful to understand or if you prefer a new mapping schema.

Bye

Tiziana

Da: cgareta [ mailto:notifications@github.com mailto: notifications@github.com] Inviato: lunedì 2 marzo 2015 12:41 A: ALIADA/aliada-tool Cc: tpossemato Oggetto: Re: [aliada-tool] RDFizer: NLP usage (#76)

Good question: work is the main entity, but catalogues are plenty of manifestations without tags at work level… Manifestation+item is the most common situation

De: Andrea Gazzarini [ mailto:notifications@github.com mailto: notifications@github.com] Enviado el: lunes, 02 de marzo de 2015 10:11 Para: ALIADA/aliada-tool CC: cgareta Asunto: Re: [aliada-tool] RDFizer: NLP usage (#76)

This is another interesting question: what is the target entity? Work? Expression? Manifestation?

I think @cgareta or @tpossemato could help us

On 03/02/2015 09:33 AM, Idoia wrote:

I have just made a trial creating the following two objects:

|

/owl:NamedIndividual

/owl:NamedIndividual | and then I have passed the OWLMicroREasoner, and it has given me the following errors: | Conflicts - Error ("conflict"): "Individual a member of disjoint classes" Culprit = < http://erlangen-crm.org/efrbroo/NLPTrial_Object http://erlangen-crm.org/efrbroo/NLPTrial_Object> < http://erlangen-crm.org/efrbroo/NLPTrial_Object> http://erlangen-crm.org/efrbroo/NLPTrial_Object Implicated node: < http://erlangen-crm.org/current/E18_Physical_Thing http://erlangen-crm.org/current/E18_Physical_Thing> < http://erlangen-crm.org/current/E18_Physical_Thing> http://erlangen-crm.org/current/E18_Physical_Thing Implicated node: < < http://erlangen-crm.org/current/E28_Conceptual_Object> http://erlangen-crm.org/current/E28_Conceptual_Object> < http://erlangen-crm.org/current/E28_Conceptual_Object> http://erlangen-crm.org/current/E28_Conceptual_Object | The two classes are disjoint, so it would not be possible. There would no problem with the F3_Manifestation_Product_Type. — Reply to this email directly or view it on GitHub < < < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522> https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522> < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522> https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522>.

— Reply to this email directly or view it on GitHub < < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76678772> https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76678772> . < < https://github.com/notifications/beacon/AH33Nu96VNLx1GJ0IiIlMbaoX8DyRo74ks5nxCCugaJpZM4DT3Wd.gif>

https://github.com/notifications/beacon/AH33Nu96VNLx1GJ0IiIlMbaoX8DyRo74ks5nxCCugaJpZM4DT3Wd.gif>

— Reply to this email directly or view it on GitHub < < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76698067> https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76698067> . < https://github.com/notifications/beacon/AH3_ouWWtZ-FdB1URtFL9vfPq2Vh5ecRks5nxEOygaJpZM4DT3Wd.gif>

— Reply to this email directly or view it on GitHub < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76699299> . < https://github.com/notifications/beacon/AH33Np4ykWm8uiER_kLII4UVHJ-ISotXks5nxEZQgaJpZM4DT3Wd.gif>

— Reply to this email directly or view it on GitHub https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76703899.

cgareta commented 9 years ago

Ok, I will try to finish the revisión of the mapping from dc, but I don’t know if LIDO includes notes like marc notes

De: Andrea Gazzarini [mailto:notifications@github.com] Enviado el: lunes, 02 de marzo de 2015 15:04 Para: ALIADA/aliada-tool CC: cgareta Asunto: Re: [aliada-tool] RDFizer: NLP usage (#76)

Sorry we need another dimension: the source format, because tags are valid for marc but we need that also for lido and dc, too On 2 Mar 2015 13:36, "Andrea Gazzarini" < mailto:a.gazzarini@gmail.com a.gazzarini@gmail.com> wrote:

Ok @cgareta, yes I see.

Now, before seeing the attachment of Tiziana, in your file there are listed a lot of tags, but as you know, we don't have to process all tags with NLP; for that kind of recognition we need only those tag that do have a reasonable amount of text, like notes.

Could you please list them?

Best, Andrea

2015-03-02 13:30 GMT+01:00 cgareta < mailto:notifications@github.com notifications@github.com>:

Please, consider also the entities related to subjects:

  • Concept
  • Place
  • Event
  • Object

These entities are related to the work

Cristina

De: tpossemato [ mailto:notifications@github.com mailto:notifications@github.com] Enviado el: lunes, 02 de marzo de 2015 12:52 Para: ALIADA/aliada-tool CC: cgareta Asunto: Re: [aliada-tool] RDFizer: NLP usage (#76)

Hi Andrea, attached a document with a mapping of most used Marc21 tags for Work, Expression and Person/Family/Corporate body: all tags not included in the sheets, if coming from Marc21 bibliographic, has to be considered useful to identify manifestations.

So, generally speaking:

  • for Work, Expression and Person/Family/Corporate body: you need to consider at least authority data plus bibliographic data reported in the sheets;
    • for manifestations: you need to consider above all bibliographic tags, with exception of the tags reported in the sheets for other entities.

Let me know if it’s useful to understand or if you prefer a new mapping schema.

Bye

Tiziana

Da: cgareta [ < mailto:notifications@github.com mailto:notifications@github.com> mailto: mailto:notifications@github.com notifications@github.com] Inviato: lunedì 2 marzo 2015 12:41 A: ALIADA/aliada-tool Cc: tpossemato Oggetto: Re: [aliada-tool] RDFizer: NLP usage (#76)

Good question: work is the main entity, but catalogues are plenty of manifestations without tags at work level… Manifestation+item is the most common situation

De: Andrea Gazzarini [ < mailto:notifications@github.com mailto:notifications@github.com> mailto: mailto:notifications@github.com notifications@github.com] Enviado el: lunes, 02 de marzo de 2015 10:11 Para: ALIADA/aliada-tool CC: cgareta Asunto: Re: [aliada-tool] RDFizer: NLP usage (#76)

This is another interesting question: what is the target entity? Work? Expression? Manifestation?

I think @cgareta or @tpossemato could help us

On 03/02/2015 09:33 AM, Idoia wrote:

I have just made a trial creating the following two objects:

|

/owl:NamedIndividual

/owl:NamedIndividual | and then I have passed the OWLMicroREasoner, and it has given me the following errors: | Conflicts - Error ("conflict"): "Individual a member of disjoint classes" Culprit = < < http://erlangen-crm.org/efrbroo/NLPTrial_Object http://erlangen-crm.org/efrbroo/NLPTrial_Object> http://erlangen-crm.org/efrbroo/NLPTrial_Object http://erlangen-crm.org/efrbroo/NLPTrial_Object> < http://erlangen-crm.org/efrbroo/NLPTrial_Object http://erlangen-crm.org/efrbroo/NLPTrial_Object> http://erlangen-crm.org/efrbroo/NLPTrial_Object http://erlangen-crm.org/efrbroo/NLPTrial_Object Implicated node: < < http://erlangen-crm.org/current/E18_Physical_Thing http://erlangen-crm.org/current/E18_Physical_Thing> http://erlangen-crm.org/current/E18_Physical_Thing http://erlangen-crm.org/current/E18_Physical_Thing> < http://erlangen-crm.org/current/E18_Physical_Thing http://erlangen-crm.org/current/E18_Physical_Thing> http://erlangen-crm.org/current/E18_Physical_Thing http://erlangen-crm.org/current/E18_Physical_Thing Implicated node: < < http://erlangen-crm.org/current/E28_Conceptual_Object http://erlangen-crm.org/current/E28_Conceptual_Object> http://erlangen-crm.org/current/E28_Conceptual_Object http://erlangen-crm.org/current/E28_Conceptual_Object> < http://erlangen-crm.org/current/E28_Conceptual_Object http://erlangen-crm.org/current/E28_Conceptual_Object> http://erlangen-crm.org/current/E28_Conceptual_Object http://erlangen-crm.org/current/E28_Conceptual_Object | The two classes are disjoint, so it would not be possible. There would no problem with the F3_Manifestation_Product_Type. — Reply to this email directly or view it on GitHub < < < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522> https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522> < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522> https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76674522>.

— Reply to this email directly or view it on GitHub < < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76678772 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76678772> https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76678772 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76678772> . < < https://github.com/notifications/beacon/AH33Nu96VNLx1GJ0IiIlMbaoX8DyRo74ks5nxCCugaJpZM4DT3Wd.gif https://github.com/notifications/beacon/AH33Nu96VNLx1GJ0IiIlMbaoX8DyRo74ks5nxCCugaJpZM4DT3Wd.gif>

https://github.com/notifications/beacon/AH33Nu96VNLx1GJ0IiIlMbaoX8DyRo74ks5nxCCugaJpZM4DT3Wd.gif https://github.com/notifications/beacon/AH33Nu96VNLx1GJ0IiIlMbaoX8DyRo74ks5nxCCugaJpZM4DT3Wd.gif>

— Reply to this email directly or view it on GitHub < < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76698067 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76698067> https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76698067 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76698067> . < https://github.com/notifications/beacon/AH3_ouWWtZ-FdB1URtFL9vfPq2Vh5ecRks5nxEOygaJpZM4DT3Wd.gif https://github.com/notifications/beacon/AH3_ouWWtZ-FdB1URtFL9vfPq2Vh5ecRks5nxEOygaJpZM4DT3Wd.gif>

— Reply to this email directly or view it on GitHub < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76699299 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76699299> . < https://github.com/notifications/beacon/AH33Np4ykWm8uiER_kLII4UVHJ-ISotXks5nxEZQgaJpZM4DT3Wd.gif https://github.com/notifications/beacon/AH33Np4ykWm8uiER_kLII4UVHJ-ISotXks5nxEZQgaJpZM4DT3Wd.gif>

— Reply to this email directly or view it on GitHub < https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76703899 https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76703899>.

— Reply to this email directly or view it on GitHub https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-76716299 . https://github.com/notifications/beacon/AH33Nnlo1CQcXS_RCGxSFaV_t85KLARGks5nxGVTgaJpZM4DT3Wd.gif

tpossemato commented 9 years ago

Andrea, see if the file I have sent you per e-mail is what can help you: if yes, I'll continue tomorrow to complete the report.

idoiamurua commented 9 years ago

Here is the mapping to use to save the named entities extracted with NLP:

The extracted named entities can be instances of skos:Concept (in ALIADA ontology skos:Concept is subclass of E55_Type, and so it is a E1_CRM_Entity) . The Work/Expression would be an instance of E89_PropositionalObject, and they would be related as follows:

E89_PropositionalObject P129_is_about E1_CRM_Entity (work, expression) (skos:concept)

The extracted named entities can be instances of skos:Concept (in ALIADA ontology skos:Concept is subclass of E55_Type) . So, they would be related as follows:

E1_CRM_Entity P137 exemplifies skos:Concept. (manifestation, ítem, LIDO physicalObject) (skos:concept)

To differentiate among Persons, Places, etc. Marta says that we should not do it too complicated as I commented to you before. That is, we do not need to create instances of Person, Place, etc, but all the extracted named entities will be instances of skos:concept, and we will group them using the skos:Collection class. That is, we will create the following collections:

   PersonsCollection rdf:type skos:Collection
   PlacesCollection rdf:type skos:Collection
   ????.....

When a named entity is extracted (e.g. a Person name like Darwin), the following triples are generated:

  Darwin rdf:type skos:Concept
  PersonsCollection skos:member Darwin
  Work1 rdf:type ecrm:E89_PropositionalObject 
  Work1 ecrm:P129_is_about Darwin

In the end, I do not know if you (Andrea, Cristina and Tizziana) agreed the level where to add the extracted named entities (at Work, Expression or Manifestation level), but the 2 options explained here let you do it at any of the levels you decide.

cgareta commented 9 years ago

Hi all,

If we use only that level of entities we will lose a lot of attributes related to the manifestation (and finally to the bibliographic record because stored bibliographic records are usually manifestations), and the relationship to the item entity. I think that it’s not a good idea… but I have to read the explanation more slowly to be aware of the problem

@tiziana, what do you think about it?

Cristina

De: Idoia [mailto:notifications@github.com] Enviado el: miércoles, 04 de marzo de 2015 11:59 Para: ALIADA/aliada-tool CC: cgareta Asunto: Re: [aliada-tool] RDFizer: NLP usage (#76)

Here is the mapping to use to save the named entities extracted with NLP:

The extracted named entities can be instances of skos:Concept (in ALIADA ontology skos:Concept is subclass of E55_Type, and so it is a E1_CRM_Entity) . The Work/Expression would be an instance of E89_PropositionalObject, and they would be related as follows:

E89_PropositionalObject P129_is_about E1_CRM_Entity (work, expression) (skos:concept)

The extracted named entities can be instances of skos:Concept (in ALIADA ontology skos:Concept is subclass of E55_Type) . So, they would be related as follows:

E1_CRM_Entity P137 exemplifies skos:Concept. (manifestation, ítem, LIDO physicalObject) (skos:concept)

To differentiate among Persons, Places, etc. Marta says that we should not do it too complicated as I commented to you before. That is, we do not need to create instances of Person, Place, etc, but all the extracted named entities will be instances of skos:concept, and we will group them using the skos:Collection class. That is, we will create the following collections:

PersonsCollection rdf:type skos:Collection PlacesCollection rdf:type skos:Collection ????.....

When a named entity is extracted (e.g. a Person name like Darwin), the following triples are generated:

Darwin rdf:type skos:Concept PersonsCollection skos:member Darwin Work1 rdf:type ecrm:E89_PropositionalObject Work1 ecrm:P129_is_about Darwin

In the end, I do not know if you (Andrea, Cristina and Tizziana) agreed the level where to add the extracted named entities (at Work, Expression or Manifestation level), but the 2 options explained here let you do it at any of the levels you decide.

— Reply to this email directly or view it on GitHub https://github.com/ALIADA/aliada-tool/issues/76#issuecomment-77138218 . https://github.com/notifications/beacon/AH33NhaMFz6Xova_9THNp3E_h7P9zAXoks5nxt0CgaJpZM4DT3Wd.gif

idoiamurua commented 9 years ago

@cgareta We are not going to remove anything. We are going to add new properties to the work, manifestation, item or whatever you decide to associate to the extracted named entity.

agazzarini commented 9 years ago

Hi @idoiamurua @marta, many thanks, sounds really good!

idoiamurua commented 9 years ago

Regarding LIDO, the NLP should be applied to the following data propery:

ecrm:E19_Physical_Object ecrm:P3_has_note xsd:string

These triples will be generated from the following tags in the LIDO records:

  <lido:lidoWrap xmlns:lido="http://www.lido-schema.org">
      <lido:lido>
           <lido:descriptiveMetadata xml:lang="en">
                 <lido:objectIdentificationWrap>
                     <lido:objectDescriptionWrap>
                           <lido:objectDescriptionSet>
                                 <lido:descriptiveNoteValue xml:lang="en" lido:type ="physical-description"> Sculpture of Mozart …..</lido:descriptiveNoteValue >
agazzarini commented 9 years ago

Hi @idoiamurua sorry I lost your email about the question: few days ago you asked where we are going to put the literal for the extracted entities. Now that I'm implementing that stuff, I see that the things is wrapped as a concept. As consequence of that we should use skos:prefLabel instead of p3_has_note

Is that correct?

idoiamurua commented 9 years ago

Yes. I was about to answer to you that it was better to use skos:prefLabel. The difference with ecrm:P3_has_note is that skos:prefLabel is a annotation property and ecrm:P3_has_note is a data property. The e-mail you were looking for is in the following issue: https://github.com/ALIADA/aliada-tool/issues/37

agazzarini commented 9 years ago

@idoiamurua I'm lost with the OPTION 2 above...could you please write down an example where the extracted entity is attached to a LIDO object?

idoiamurua commented 9 years ago
Darwin rdf:type skos:Concept 
PersonsCollection skos:member Darwin 
E19_Physical_Object1 ecrm:P137_exemplifies Darwin
Darwin skos:prefLabel "Darwing" . 

That is, you do not need to create any other rdf:type triple for the LIDO object, just add the ecrm:P137_exemplifies object property (the same applies to the manifestations).

agazzarini commented 9 years ago

+1 Many thanks!

agazzarini commented 9 years ago

As part of the last two pushes, the NLP branch (along with its feature) has been merged into master. I implemented that both in LIDO and in MARC. For this latter, I tried to detect entities only for few tags because that processing is very heavy, from a computational point of view. As consequence of that, I suggest: we should try to solve the performance issue raised by Adam and then increment the NER usage step by step, tag by tag.