TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
271 stars 88 forks source link

biblStruct for Patent citations #353

Closed TEITechnicalCouncil closed 8 years ago

TEITechnicalCouncil commented 12 years ago

We are implementing a project for encoding our patent and non-patent literature according to the TEI standard.

For doing so, we would need to have a very precise bibliographical reference (very important in the patent literature) of patent documents. The current TEI standard does not allow us to encode the patent bibliographical citations. In the patents, the patent documents are cited according to a very well defined encoding for which the main elements are:

- Identification of a Patent Authority

Therefore, we would need to have the following structure in TEI for encoding the bibliographic information of patents:

<biblStruct type="patent¦utilityModel¦designPatent¦plant" status="application¦publication"> <monogr> <authority> <orgName type="national¦regional"><orgName> </authority> <idno type="docNumber"></idno> <date type=""applicationDate¦publicationDate"></date> <imprint> <idno type="kindCode"></idno> </imprint> </monogr> </biblStruct>

I would like to add some examples, to show the importance of having this structure for our project:

1) Normally in the patent documents, the citation of other patents is one of the most important information. This citation could identify the priority patents, related patents or simply are patents cited in the document. The bibliographical reference to these patents is done without indicating any title, but using the patent standard bibliographical codification. See the following examples (I attached a file with the corresponding images)

E1) In this text (from a patent) another patent is cited by: "Japanese Patent Laid-Open No. 223883/1974 E2) in this example you can see how normally the bibliographical information of the patents is provided: E3) also non-patent literature uses very often this kind of citation, see the following example:

2) I would like also to indicate that there are different citation manual styles which explicitly avoid to use the title and other information to cite the patents:

Bluebook Citation: U.S. Patent No. 6,885,550 (issued Apr. 26, 2005).

APA Citation: Williams, D. (2005). U.S. Patent No. 6,885,550. Washington, DC: U.S.

ACS Citation: Williams, D. U.S. Patent 6,885,550, 2005.

Original comment by: sf_user_posejavier

TEITechnicalCouncil commented 8 years ago

This issue was originally assigned to SF user: kshawkin Current user is: kshawkin

TEITechnicalCouncil commented 12 years ago

Patent citation Examples

Original comment by: sf_user_posejavier

TEITechnicalCouncil commented 12 years ago

I see this as quite convincing. Would it make sense, once the change in biblStruct is made to add such a patent citation example in the guidelines. Would be good to show the variety of applications....

Original comment by: @laurentromary

TEITechnicalCouncil commented 12 years ago

Original comment by: @jamescummings

TEITechnicalCouncil commented 11 years ago

Thank you, Javier, for providing so much background information.

Using the encoding structure you propose, it appears that the following changes would need to be made to the P5 content models:

a) Add @status to <biblStruct> by way of adding this element to the att.docStatus class.

b) Allow <authority> as a child of <monogr>.

In addition, we would add some examples of patent citations to section 3.11 of the Guidelines.

While you give various possible values of @type and @status on various elements, I think you'll agree that we shouldn't limit the values on these elements since these elements can be used for other things.

Javier, does this all sound right?

I see no reason not to implement this. If other members of Council agree with this, I suggest we do two things:

  1. One of the Council members can implement the changes to the content models.

  2. We ask Javier to provide suggested changes to the prose of section 3.11.

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

Hi, many thanks for your answer. Basically the changes that you enumerate are right but there are also two additional small changes:

1) Currently, the element <idno> is only allowed inside the <monogr> IF it goes after the element <title>. This seems to be an arbitrary restriction. In the case of patents, most of the times the bibliographic citation does not have the title of the patent, so it should also be alloed to have the element <idno> inside <monogr> without restrictions. 2) It would also be needed to allow the element <idno> inside <imprint> in order to encode the patent code.

Original comment by: sf_user_posejavier

TEITechnicalCouncil commented 11 years ago

Pardon the delay in getting back to this. I completely missed that your sample was suggesting <idno> in two separate places that it is not currently allowed. However:

1) How would the semantics of biblStruct/monogr/idno be different from biblStruct/idno? That is, why exactly did you want to make <idno> a child of <monogr> rather than a sibling for the patent number? I am quite reluctant to allow this element in two places for fear of causing the kind of confusion we have for <biblScope> in <biblStruct> (see http://purl.org/TEI/FR/3555190 in case you're interested).

2) Why do you want to put the patent code inside <imprint> instead of as a sibling of the other <idno>? You could distinguish the two with @type.

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

Hi, many thanks for your comments.

I will try to explain the reasons why we propose the to have the information of the patent kindCode (<idno type="kindCode"></idno>) in the element <imprint>:

The element <imprint> groups information relating to the publication or distribution of a bibliographic item.

This is exactly what the kindCode of a patent document means because it informs about the publication or distribution of the patent (!).

Basically a patent is identify with a set of metadata (basically a patent authority and patent number) and the patent publication is further characterized by the additional "kindCode". Therefore, the kindCode provides the information relating to the publication or distribution of the patent (for example if it is a patent published after a search of the patent examiner, or patent published during the examination procedure...). A patent, during its life cycle, is published "physically" several times, each version corresponding to additional corrections and refinements. It appears thus appropriate to put the identifier corresponding to the publication under the element grouping information related to publication of the bibliographical item, so imprint. One can refer to a patent, or to a particular patent publication.

Regarding your two questions I will now answer them, the arguments being basically already provided in the explanation above:

Regarding (1), in our proposal, the idno corresponding to the patent number (<idno type="docNumber">...</idno>) specifies the patent as a separate stand-alone bibliographical entity, which correspond to an independent item which can be cited as such. This sort of bibliographic information is normally grouped in the monogr section together with information like inventors, similarly as a book or a report. The patent number is actually relatively similar to a volume of a serial publication, the serie being the granting patent authority (e.g. patent publication 000001 from the USPTO). It does not appear to us consistent to put the a similarly semantic idno under biblStruct for patent, and under monogr for a book.

Regarding (2), the TEI indicates that <imprint> groups information relating to the publication or distribution of a bibliographic item. As explained before, a patent is identify with a set of metadata, and a patent publication by the additional "kind code". A patent, during its life cycle, is published "physically" several times, each version corresponding to additional corrections and refinements. It appears thus appropriate to put the identifier corresponding to the publication under the element grouping information related to publication of the bibliographical item, so imprint. One can refer to a patent, or to a particular patent publication.

Original comment by: sf_user_posejavier

TEITechnicalCouncil commented 11 years ago

I see what you are saying about (1). Now that I look over section 3.11.1 of the Guidelines, I am reminded that even in a simple <biblStruct>, usually all components of the citation are wrapped in <monogr>. While a few elements are allowed outside of <analytic>, <monogr>, and <series>, these appear to be for exceptional purposes where the information outside of these elements refers to more than one of them. So I now agree that it makes sense for a patent number to be inside <monogr>. And since you have patent citations that lack titles, we should no longer require <title> inside <monogr> to support this usage.

Regarding (2), thank you for the explanation of what a patent kindCode is. You hadn't actually explained it before, and nothing about the term "kind code" indicated to me that it relates to the publication or distribution of the patent document. (I would have guessed that a kind of patent is a classification along the lines of "physical device", "business process", etc.) However, from your description, it sounds like such a code isn't really an "identifier used to identify some object" (from the definition of <idno>); rather, it's akin to how <term> is used within <keywords>, no? That is, I actually think it might make more sense to use <term> for (2). What do you think?

So at this point I am prepared to support the following four changes to P5 content models to support citations of patents:

a) Add @status to <biblStruct> by way of adding this element to the att.docStatus class.

b) Allow <authority> as a child of <monogr>.

c) No longer require <title> inside <monogr>.

d) Allow <term> as a child of <imprint>.

As before, if other members of Council agree with these changefs, I suggest we do two things:

  1. One of the Council members can implement the changes to the content models.

  2. We ask Javier to provide suggested changes to the prose of section 3.11 and examples of patent citations illustrating (a) through (d) above.

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

Javier, if it's easier to discuss by Skype, I'd be happy to do that. My user is kshawkin. I could speak Wednesday or later.

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

On a quick reading of this proposal, I am rather appalled by the suggestion that <title> should become optional. Any bibliographic entry must have a title, surely? Even in the abbreviated references above, would it be wrong to regard the title as (e.g.) "Us Patent No xxxxx" ?

Original comment by: @lb42

TEITechnicalCouncil commented 11 years ago

Original comment by: @lb42

TEITechnicalCouncil commented 11 years ago

Javier demonstrates that it is common practice to cite patents without reference to a title of the patent. While one might call this an "abbreviated reference", I don't see why the TEI should prevent someone from using biblStruct to record such a citation which is otherwise structured. We allow a <pubPlace> and <publisher> to be omitted from a citation in the case of, say, a journal article, in which these are not typically given. Why not do the same for patents?

For what it's worth, while Lou suggests regarding the title as "Us Patent No xxxxx", but looking at the Word document attached to this ticket, I see that E2's actual title is more likely to be "System and Method for Natural Language Processing and Using Ontological Searches".

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

Hi Kevin, sorry for the delay to reply. I was these two last weeks on holidays (...in fact just married ;-) Could it be possible to have a Skype next week (Friday or weekend)?

Original comment by: sf_user_posejavier

TEITechnicalCouncil commented 11 years ago

Javier: I've been at the TEI Council meeting most of this week and will be traveling this weekend as well. Please email me at kevin.s.hawkins@ultraslavonic.info with some suggested times (and your time zone) so we can figure out a time that might work on Tues., Sept. 25th, or later.

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

For the record, the Technical Council decided at its September 2012 meeting in Oxford to no longer allow <idno> as a child of <biblStruct>: see http://purl.org/tei/fr/3565878 . (In the discussion below, we have already agreed that we would like to put <idno> inside of <monogr>, so this doesn't affect us.)

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

Javier and I discussed by Skype. Aside from Lou's objection to omission of a <title> in a TEI-encoded citation (which both Javier and Kevin feel there is a good use case for), the only outstanding question is whether to how to encode the "kind code". This contains a coding used by a particular patent office to note the status of a document in the application and publication process. The codes vary between patent offices. At the European Patent Office, there are four or five codes that apply to patent applications and four or five which apply to patent publications. (Javier will provide the kind codes used by the European Patent Office in a comment on this ticket to make this discussion more concrete.) Patents are sometimes cited as an application or publication (the value of biblStruct@status) without reference to the kind code, so we can't simply put the kind code in biblStruct@status.

Javier feels that the kind code relates to the publication of the patent and therefore belongs inside <imprint>. Kevin suggested imprint@status, but Javier said that kind codes feel to him more like content than an attribute value. If there is a use case where you might want to use markup within a kind code (for example, if you are transcribing patent citations that include kind codes from a source document and want to use <sic> or <corr>), then it definitely need to be in an element.

(If we definitely agree not to use imprint@status for kind codes, I wonder whether we should use imprint@status for what is currently on biblStruct@status since that also relates to the publication of the patent.)

Let's say for now that the kind code would be included as the content of some element inside of <imprint>. Do we use <idno>, as Javier suggests, or <term>, as Kevin suggests? Javier suggests <idno> because the element definition says "supplies any form of identifier used to identify some object, such as a bibliographic item, a person, a title, an organization, etc. in a standardized way." Kevin explained in reply that he's only ever known an <idno> used in a way in which the content of this element identifies a single entity, whereas kind codes are actually standardized terms from a typology which don't identify any particular entity. For that, Kevin feels that <term> is the usual way this is done in TEI (at least when they occur inside <keywords> in the header).

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

Regarding the proposal of adding the kindCode as an attribute of the <imprint> I think it is not the best place for the following reasons:

1) Most patent authorities issue more than one document for any particular patent. These sequential documents often keep the same number, so they are distinguished by adding a letter immediately after the number, called the kind code. Therefore, the kindCode is an element of the four elements used for identifying a patent document (Patent Authority + Patent Number + Date + Kind Code). If the kindCode is stored as an attribute of <imprint>, then <imprint> would be an empty element with the information of the kindCode stored as an attribute. This seems to be somehow strange, because we would have an empty <imprint> (which is not allowed currently by TEI guidelines).

2) Even allowing empty <imprint> the fact of storing the kind code as attribute would restrict possible functionalities like the following: in most of the offices the kind code is composed by one letter and one number (for example A2, C1,... For more detailed information see http://www.delphion.com/help/kindcodes). The letter and the number carry information about a particular aspect of the current situation of the document, so it could be convenient to encode the kind code as two separate items, the letter and the number. If the kind code is stored as a whole as an attribute, this won’t be possible, but if it is stored as child of <imprint>, then it would be possible to "fine-grain" encode separately the letter from the number composing the whole kind code. According to the XML common practice, If the information is expressed in a structured form, especially if the structure may be extensible, the elements should be used. On the other hand: If the information is expressed as an atomic token, the attribute could be used (see for example http://www.ibm.com/developerworks/xml/library/x-eleatt/index.html). In this case, the kind code IS NOT an atomic information, but it is composed in most of cases of one letter and one number which have specific meanings. Therefore, it seems to be more appropriate to store this information as an element, i.e. a child of <imprint>.

For these reasons, I would store the kind code information as a child of the <imprint> element.

P.S.: I attach a file with a brief explanation of the kind codes at the European Patent Office. The extended information of the different kind codes in other patent authorities can be found in http://www.delphion.com/help/kindcodes

Original comment by: sf_user_posejavier

TEITechnicalCouncil commented 11 years ago

EPO kind code

Original comment by: sf_user_posejavier

TEITechnicalCouncil commented 11 years ago

Now that I seem some examples of kind codes, I believe that we should use <classCode> or <catRef>, not <term> (and still not <idno>), for these. See section 2.4.3 ( http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html\#HD43 ), especially, the last paragraph, for an explanation of these two elements. As in the last paragraph of section 2.4.3, either could be used by the encoder for a kind code depending on whether the kind codes used are from an open-ended system and whether they are documented in the header.

Furthermore, now that I think about it, since <imprint> will not be an empty element, I think we should put status="application¦publication" on <imprint>, not on <biblStruct>, since it relates to "the publication or distribution of a bibliographic item".

So my proposal is:

a) Add @status to <imprint> by way of adding this element to the att.docStatus class.

b) Allow <authority> as a child of <monogr>.

c) No longer require <title> inside <monogr>.

d) Allow <classCode> and <catRef> as a child of <imprint>.

If Council approves, I suggest that:

  1. We ask Javier to provide suggested changes to the prose of section 3.11 and examples of patent citations illustrating (a) through (d) above.

  2. One of the Council members can implement the changes to the content models.

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

I just noticed that Javier's original example included

<date type=""applicationDate¦publicationDate"></date>

as a child of <biblStruct>, not as a child of <imprint>. Currently this is not allowed in TEI, and I think that we should just put this inside <imprint>, which is for information related to publication and distribution. I think that just because something is still unpublished (only an application) you can and should still use <imprint> for the equivalent information.

Here's a summary of what is being suggested:

<biblStruct type="patent¦utilityModel¦designPatent¦plant"> <monogr> <authority> <orgName type="national¦regional">[name of patent office goes here]<orgName> </authority> <idno type="docNumber">[document number goes here]</idno> <imprint status="application¦publication"> <classCode>[kind code goes here]</classCode> <== Note that you could also use <catRef> here! <date type=""applicationDate¦publicationDate">[date goes here]</date> </imprint> </monogr> </biblStruct>

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

1) Regarding the idea of storing the <date> as child of <imprint> I agree with Kevin and I think it is a good idea. It is important to note that this date refers to the date of filing/publication of the patent, and it is not related to the specific stage of the patent. The application-date is the date when a complete application was received and the publication-date is the date on which the patent application is published (i.e. the information is available to public) normally 18 months after filing or 18 months after priority date. Since the <imprint> "groups information relating to the publication or distribution of a bibliographic item, it seems a good idea to store it there.

2) Regarding the idea of using <classCode>for storing the kindCode, I also agree with Kevin.

3) Regarding the proposal of having status="application¦publication" on <imprint>, I think it is not a good idea, because there are some bibliographic references to patents WITHOUT the kind Code and date, what would lead us to an empty <imprint>. If we store the status in an attribute of the <imprint>, then we will have a problem for those references which do not have a kind Code, because it would be an empty <imprint>. Furthermore, the status (application¦publication) is a property affecting of the whole bibliographic reference and not only of part of it. Therefore, I still think that the status should be an attribute of <bibliStruct>.

So my proposal (which is the same as Kevin except for the @status), would be:

<biblStruct type="patent¦utilityModel¦designPatent¦plant" status="application¦publication"> <monogr> <authority> <orgName type="national¦regional">[name of patent office goes here]<orgName> </authority> <idno type="docNumber">[document number goes here]</idno> <imprint> <classCode>[kind code goes here]</classCode> <== Note that you could also use <catRef> here! <date type=""applicationDate¦publicationDate">[date goes here]</date> </imprint> </monogr> </biblStruct>

Original comment by: sf_user_posejavier

TEITechnicalCouncil commented 11 years ago

Some quick comments from Lou:

  1. I agree that adding @status on the <biblStruct> makes more sense than adding it on the <imprint> only
  2. I still think a <title> should be present, even if it's just boiler plate. Or why not use <title> in preference to <monogr> as a means of wrapping some of the other parts <title><orgName>...</orgName><idno>...</idno> <date>...</date></title>?
  3. <classCode> and <catRef> are not exactly the same. You can only use the latter if youve got a <classDecl> somewhere to point at.

Original comment by: @lb42

TEITechnicalCouncil commented 11 years ago

I still prefer imprint@status instead of biblStruct@status because it is the status of publication, not the status of the citation, that we are noting with this attribute. Furthermore, it's odd that you would put <date type="applicationDate"> and/or <date type="publicaitonDate"> in <imprint> and yet put status="application" or status="publication" on <biblStruct> rather than <imprint>. In my opinion we should not let the fact that an <imprint> element would be empty stop us if the encoding would be well structured.

That said, I could live with leaving @status on <biblStruct>, especially in order to support other uses of this attribute besides for recording applilcation versus publication. If we put it on <biblStruct>, I think we should also do it on <bibl> and <biblFull>.

Lou's suggestion of

<title><orgName>...</orgName><idno>...</idno> <date>...</date></title>

assumes that when reading a citation such as:

U.S. Patent No. 6,885,550 (issued Apr. 26, 2005).

one should understand the title to be "U.S. Patent No. 6,885,550 (issued Apr. 26, 2005)". That's absurd. As I mentioned in a previous comment, and as can be seen in "PatentCitationExamples.doc", patent documents have actual titles, but no one cites them that way. What's the point of having a placeholder title in the data? If a style guide for some citation format requires a title for each citation, then the person writing the stylesheet for converting <biblStruct>s to that citation format can construct the required title however it should be created. Compare this with a citation that lacks a place of publication or publisher: most of us would omit <pubPlace> or <publisher>, and the output would insert "s.l.", "s.n.", "n.d.", "[no date]", etc. in place of the place or publisher.

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

Regarding the comments about the suggestion of Lou about <title>, I totally agree with Kevin. The patents have in fact their own titles, and putting the patent reference (patent authority + patent number + kind code + date) as its title (<title><orgName>...</orgName><idno>...</idno> <date>...</date></title>) would be totally confusing and not accepted in the patent community (!!!).

Original comment by: sf_user_posejavier

TEITechnicalCouncil commented 11 years ago

in order to allow for this structure:

<biblStruct type="patent¦utilityModel¦designPatent¦plant" status="application¦publication"> <monogr> <authority> <orgName type="national¦regional">[name of patent office goes here]<orgName> </authority> <idno type="docNumber">[document number goes here]</idno> <imprint> <classCode>[kind code goes here]</classCode> <== Note that you could also use <catRef> here! <date type=""applicationDate¦publicationDate">[date goes here]</date> </imprint> </monogr> </biblStruct>

In order to help the EPO claim full TEI compliance sooner than later, have made the following schema changes now ( http://tei.svn.sourceforge.net/viewvc/tei?view=revision&revision=10992 ):

a) Add @status to <biblStruct>, <bibl>, and <biblFull> by way of adding these elements to the att.docStatus class.

b) Allow <authority> as a child of <monogr> (by creating a third version of the complicated first half of the content model, which requires this followed by <idno>). I have also loosened the definition of <authority> to no longer refer just to an "electronic file".

c) No longer require <title> inside <monogr> (by creating a third version of the complicated first half of the content model, which requires <authority> followed by <idno>)

d) Allow <classCode> and <catRef> as optional children of <imprint> (before model.dateLike).

I will email Javier asking him to provide suggested changes to the prose of section 3.11 and examples of patent citations illustrating (a) through (d) above, which I will incorporate into the Guidelines once I receive them. Since we will have a release soon, the schema and prose changes are unlikely to happen in the same release.

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

Pardon the sloppy editing of my previous comment. I think what I meant is clear enough despite my hasty writing.

I should note that I set up the content model of <monogr> so that if you are going to not have a <title>, you must have exactly one <authority> followed by exactly one <idno>. But it now occurs to me that I'm not sure that we really mean to be so strict. Javier, did you intend for <authority>, <idno>, or both to be optional for a patent citation? Can <authority>, <idno>, or both be repeated (occur multiple times) in the same patent citation? If so, I should adjust the content model.

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

Looking again at the current model for biblStruct, I really think the vision should be to simplify it by having more models there and less carved in stone sequences. Would this not be a good opportunity?

Original comment by: @laurentromary

TEITechnicalCouncil commented 11 years ago

@romary: I'm not sure I understand what you mean. It sounds like you dislike that <monogr> requires elements to appear in a certain order, but I'm not sure what you mean by adding "more models". The change I made in fact introduced a third option (in addition to the two already there) for the structure of date appearing within <monogr> before <imprint>. Here is the latest content model (not yet published in a TEI release):

http://teijenkins.hcmc.uvic.ca:8080/job/TEIP5-Documentation/lastSuccessfulBuild/artifact/Guidelines-web/en/html/ref-monogr.html

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

I mean seeing if we could have one or several models like model.monogrPart that we could use to define the content model of monogr. For sure we would loose part or total of the current imposed order, but do we really want to keep this order? I recognize this could be another ticket, though.

Original comment by: @laurentromary

TEITechnicalCouncil commented 11 years ago

@romary: Yes, I think that would be another ticket.

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

For the record, I prompted Javier off list (in case he isn't receiving notifications through SourceForge) to check on the content model that I introduced. He wrote:

In principle, the <monogr> must have only one <authority> and only one <idno>, so it looks fine to me as you proposed. So, let's go ahead with this proposal !!!! I will adapt our internal system to this proposal, so we can claim to be 100% TEI compliant....GREAT!!!

He also agreed to draft text for the Guidelines, so I'll keep this ticket open till that prose gets integrated into the Guidelines.

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

Javier told me on 2012-11-05 that he is working on the text for the Guidelines. I have been unable to reach him to ask for the status since 2012-12-20, when I first discovered that hotmail is blocking email sent from my ISP. Hopefully his mail to me isn't also being blocked.

Javier, if you are reading this, please feel free to post an update here with an attachment of your proposed changes.

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

I can predict that Javier is bout to answer after two months AFK, but for a very good (also at times crying) reason. We should congratulate him ;-)

Original comment by: @laurentromary

TEITechnicalCouncil commented 11 years ago

Sorry for the delay. No excuses, it was my fault (!!) I re-star today to work on it. Some other issues kept me away from the work. Kevin, I think that I am able to receive your mails. Could you please, send me one as a trial to double check that everything is alright?

Original comment by: sf_user_posejavier

TEITechnicalCouncil commented 11 years ago

Sending mail to Javier still doesn't work for me. My ISP has opened two tickets with Hotmail over the past few months, but they aren't responding. :( The automatic response is below:

This is the mail system at host mail89.csoft.net.

I'm sorry to have to inform you that your message could not be delivered to one or more recipients. It's attached below.

For further assistance, please send mail to postmaster.

If you do so, please include this problem report. You can delete your own text from the attached returned message.

The mail system

<posejavier@hotmail.com>: host mx3.hotmail.com[65.54.188.110] said: 550 SC-001 (BAY0-MC3-F38) Unfortunately, messages from 205.205.224.4 weren't sent. Please contact your Internet service provider since part of their network is on our block list. You can also refer your provider to http://mail.live.com/mail/troubleshooting.aspx\#errors. (in reply to MAIL FROM command)

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

Kevin, I will get a new one in a few days. I will put the new one as soon as I get it.

Original comment by: sf_user_posejavier

TEITechnicalCouncil commented 11 years ago

Hello Javier,

I received your email with a draft of the text and tried responding by email, but Hotmail is still not accepting email from my ISP!

Unless you are sure that your colleague will track changes from this version, I would prefer not to make any corrections myself or even put this version into the TEI SourceForge repository because it then might be more work later to figure out what needs to be changed. So I think it's best to wait.

If you'd still like to talk with me before your colleague reports back with revisions, I could Skype tomorrow (on Sunday) before 11 a.m. Eastern Time (17:00 in Central Europe).

—Kevin

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

Javier and Kevin have corresponded privately, and Javier sent Kevin the attached revision.

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

Javier, I've reviewed your suggested text and made some further edits (attached). As you'll see, I've removed your specific instructions for values of @type on the various elements. The elements on which you are using @type have other uses besides for patents, so I don't want to imply that these values, which are specific to your ontology of patents, are the only ones allowed.

I hope Lou will respond to my email on why the "Authors, Titles, and Editors" section begins by saying that citations typically begin with a title followed by an author (because if we change that, we should also reword a sentence in these proposed revisions). I welcome comments by anyone reading this thread on whether we should create a new section between 3.11.2.2 and 3.22.2.3 on patent citations or just insert Javier's text at the end of 3.11.2.2.

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

I would support the introduction of a section between 3.11.2.2 and 3.11.2.3. It is a nice illustration of the capacity of the TEI to treat such a specific kind of object and would facilitate further reference.

Original comment by: @laurentromary

TEITechnicalCouncil commented 11 years ago

I would also support the introduction of a section between 3.11.2.2 and 3.11.2.3. The patent literature, already only from the point of view of a documentalist, has a great importance, because approximately 25% of all scientific or technical publications produced each year originate in patent offices around the world - most of which can be searched as any other kind of literature in free-access databases. During the last 10 years the number of patent filings has been constantly bigger than the published scientific and technical journal articles. Therefore, as Laurent said, it is a very relevant example to show the power of TEI.

Original comment by: sf_user_posejavier

TEITechnicalCouncil commented 11 years ago
  1. Happy to see "followed by" replaced by "and", but I don't think it addresses the real problem I have with this proposal (which I raised over a year ago) -- namely that it results in a <title>-free <biblStruct> which just seems wrong to me.
  2. In the proposed draft text, I see you have something which is formatted to look vaguely like a <specList> (hard to be sure since it's in some weird proprietary format) but which doesn't use the standard definitions for the elements. I would strongly recommend rewording this so that it makes clear that this is a list of recommended ways of using these elements in this context. So for example change "The following elements are provided for tagging.." to "The following TEI elements may be used for tagging..." and in each case change "contains" "may be used to contain" vel sim
  3. Using <classCode> for this "kind code" thingie is fine, but don't you need somewhere to provide an indication of where these codes are defined? e.g. a @source
  4. Happy to see this being added as an example to the existing section. Less happy to see it being a new section: this bloated chapter is already fiendishly over-structured. Also we're not defining any new elements here, just saying how to use existing ones for a spedcific application.

Original comment by: @lb42

TEITechnicalCouncil commented 11 years ago

Revised version attached. Specific notes below ...

  1. I've reworded the beginning of 3.11.2.2 at revision 11790. I agree that it doesn't address any part of this proposal, but the wording there affects what wording is appropriate to use in the text to be inserted.
  2. I also noted the text that looks like a <specList> but which isn't because there is text following each element name that isn't simply the definition of that element as given in the element specification. So I was thinking of implementing this in a way that is simply a <list>. Lou's recommended revisions here makes sense.
  3. Yes, that's right. Jose, can you give me a plausible value of @source for your example for section 3.11.2.2? You'll also want to be sure to include this in any other documents you are working on (like the page for the TEI wiki).
  4. The fact that the chapter is bloated shouldn't stop us from adding a new section: it just means that someone needs to divide this chapter into two or more. We shouldn't punish the patent people just because they want to add something which belongs in the bloated chapter. But I agree with Lou that what we are saying how to use existing elements without defining existing ones, which has given me pause about adding a new section.

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

Regarding §3, I was checking classCode in the standards and I didn't find @source as an attribute of this element. I suppose you are planning to include this attribute as a possible attribute the element. In such a case, understanding that @source "provides a pointer to the bibliographical source from which a quotation or citation is drawn", the example for section 3.11.2.2 should have source="uspto" because the example is a citation of a US patent, i.e. the kind code is defined by the USPTO (United States Patent and Trademark Office). Therefore, the example in section 3.11.2.2 should contain:

classCode source="uspto"

Regarding the example in section 3.11.2.3, the example must be also updated with source='epo' because in this case it deals with an European patent application, i.e. the kind code is defined by the EPO (European Patent Office). Therefore, the example in section 3.11.2.3 should contain:

classCode source="epo"

Only one further comment/question: I suppose that with the introduction of @source in classCode there is not anymore a need for @scheme (@kevin) is this correct? In case @scheme is still needed, I would put in the example of section 3.11.2.2 the value:

scheme="http://www.uspto.gov"

...and in section 3.11.2.3 the value:

scheme="http://www.epo.org"

Please Kevin, confirm what is the last format in order to update the document accordingly and to create the WIKI with the right examples.

Original comment by: sf_user_posejavier

TEITechnicalCouncil commented 11 years ago

Javier, the Markdown syntax used by the new SF system can't handle angle brackets, but if you put "`" at the beginning of a line, it renders it as is.

I believe that when Lou mentioned @source in his list item number 3, he meant @scheme. That's what I was referring to in my response. It seems you figured that out by the end of your comment. I've updated the attachment.

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

Thanks Kevin!!! I will take into account the trick of "`" for further posts ;-) I am working now in the WIKI article :-)

Original comment by: sf_user_posejavier

TEITechnicalCouncil commented 11 years ago

I have studied the structure of section 3.11 closely as I ponder whether to create a new subsection as we've discussed in this ticket. Long story short: I propose we do create a new section but broaden its scope slightly, calling it "Document Identifiers". Revised proposed edits attached, and more detail below if you're interested.

As I looked at this section closely, I realized that there were some things that I sloppily edited back in January when implementing another ticket. I've now cleaned up those things, so the latest version is at:

http://teijenkins.hcmc.uvic.ca/job/TEIP5/lastSuccessfulBuild/artifact/release/doc/tei-p5-doc/en/html/CO.html#COBI

The specific changes are shown in ODD format at:

https://sourceforge.net/p/tei/code/11820/

As you can see, I changed some of the subsection headings.

So now I've taken the latest version of Javier's proposal, accepted all the changes (assuming we've agreed on them), and "synchronized" the text with the revised subsection headings as in revision 11820.

Then I tracked changes again to show my new revisions to the text (and further revisions to subsection headings) that I think will lead to a smooth integration of Javier's proposed wording.

I've been staring at this too long, so I'm going to implement this weekend with fresh eyes.

Original comment by: @kshawkin

TEITechnicalCouncil commented 11 years ago

I noticed the word "pagination" in the section title was changed to "Size of a Document". Should that really be "Length of a Document"? Size suggests 6" x 9", that kind of thing.

Original comment by: @martindholmes

TEITechnicalCouncil commented 11 years ago

Since the discussion of biblStruct and citedRange is no longer in this section, we're just left with extent, which can include not just physical size but also its size in bytes, number of reels of tape, etc. So I thought "size of document" is the most appropriate brief way to describe this.

Original comment by: @kshawkin