adsabs / export_service

Export service to output ADS records with various formats including BibTex, AASTex, and multiple tagged and xml options
MIT License
3 stars 5 forks source link

missing editor #253

Closed golnazads closed 3 days ago

golnazads commented 4 weeks ago

This is what I get from ADS:

@PROCEEDINGS{1997ASPC..128.....A, title = "{Mass ejection from AGN : proceedings of a workshop held at the Carnegie Observatories in Pasadena, California, 19-21 February 1997}", keywords = {ACTIVE GALACTIC NUCLEI, QUASARS, OUTFLOWS, MASS LOSS, CONFERENCES, Quasars, Active galactic nuclei}, booktitle = {Mass Ejection from Active Galactic Nuclei}, year = 1997, series = {Astronomical Society of the Pacific Conference Series}, volume = {128}, month = jan, adsurl = {https://ui.adsabs.harvard.edu/abs/1997ASPC..128.....A}, adsnote = {Provided by the SAO/NASA Astrophysics Data System} } I put it into .bib file and replace the first line. This is what is in the .bib file:

@PROCEEDINGS{arav97, title = "{Mass ejection from AGN : proceedings of a workshop held at the Carnegie Observatories in Pasadena, California, 19-21 February 1997}", keywords = {ACTIVE GALACTIC NUCLEI, QUASARS, OUTFLOWS, MASS LOSS, CONFERENCES, Quasars, Active galactic nuclei}, booktitle = {Mass Ejection from Active Galactic Nuclei}, year = 1997, series = {Astronomical Society of the Pacific Conference Series}, volume = {128}, month = jan, adsurl = {https://ui.adsabs.harvard.edu/abs/1997ASPC..128.....A}, adsnote = {Provided by the SAO/NASA Astrophysics Data System} }

This is what I get after recompiling, in the pdf file in the reference list:

1997, Astronomical Society of the Pacific Conference Series,342 Vol. 128, Mass ejection from AGN : proceedings of a343 workshop held at the Carnegie Observatories in344 Pasadena, California, 19-21 February 1997

So no authors....

AND in the text of the paper, I get:

ara 1997 .... THIS IS INSTEAD of the reference.

So what is wrong?

golnazads commented 4 weeks ago

Hiver conversation:

June 12, 2024 11:28 AM You
@Carolyn @Alberto @Edwin so this is doctype proceedings and for proceedings bibtex displays editor not author, and there is no editor for this record in solr.

June 12, 2024 12:54 PM Alberto replied to You I have no memory of any recent changes regarding this. @Carolyn @Edwin @Golnaz AFAIK for proceedings records we keep the editors in the "author" field, so we need to change this. Or at a minimum, use what's in "editor" when available, and "author" otherwise.

June 12, 2024 1:24 PM You replied to Alberto @Carolyn @Edwin I have to disagree. Editor field needs to be correctly populated. I did work on it back in 2018 I think, it was populated correctly at that time and export was using it. I think it should be fixed again upstream, adspy.

June 12, 2024 2:00 PM Carolyn Very few records contain editor. For conferences proceedings, we have used the author of the book entry instead.

June 17, 2024 8:53 AM Alberto replied to You @Carolyn @Edwin @Golnaz Hi Golnaz, the editor field gets properly populated for the papers in the proceeding volume (see e.g. https://ui.adsabs.harvard.edu/abs/1997ASPC..128..305S/exportcitation). But for the "book entry" we need the author list to be populated with the names of the editors since they are owed credit for the book as authors. Please update the code so that we use the solr author field for populating the bibtex editor field for @PROCEEDINGS

golnazads commented 4 weeks ago

Email Conversation

From Alberto Export service Update BibTeX export for @PROCEEDINGS to include solr authors in the editor field (per hiver email)

From You Regarding adding the authors if the editor is not available for BibTex. I disagreed with that on Hiver and still do. After I posted on hiver I checked the adspy code, and as I remembered I had worked on it during 5/2018. I cannot believe that I knew exactly what year I had worked on that, lol. It has to get fixed from that end. I will try to convince you of that at the next meeting.

golnazads commented 4 weeks ago

Per BibTeX, four doctypes can have editors: proceedings, inproceedings, inbook, and abstract. In addition to BibTeX, EndNote and RefWorks also have an editor output field. However, unlike BibTeX, these two formats do not distinguish between different document types; they display the editor if available, otherwise, they do not.

Here are my three reasons for ensuring that the editor field is correctly populated in any records from the adspy side and allowing the export service to simply format the metadata without making decisions on field replacements:

1- Since Adspy has access to complete information at the point of record ingestion, populating the field with the correct information ensures that data integrity is preserved. This approach avoids the risk of introducing errors or inconsistencies during the downstream formatting process.

2- If the editor field is available from Solr, any format can include the editor information. As noted above, there are currently three formats specifying the inclusion of the editor. In the near future, other formats are likely to request this as well. Having the metadata already in Solr means outputting it would be trivial and correct.

3- By ensuring that all necessary fields, including the editor, are correctly populated during the record ingestion phase, the exporting step becomes straightforward. There is no need for additional logic to replace missing fields, which reduces the complexity of the export service and minimizes the risk of formatting errors. Additionally, having logic in the export process can significantly slow it down. Since this is a service that needs to format many records, even one extra comparison can slow it down, impacting overall performance and efficiency.

golnazads commented 4 weeks ago

I am compiling list of four doctypes that BibTex displays editor if available and not otherwise. There are records 3335413 and I have gotten 1408000 so far, with handful of 502s.

Here is the break down has editor {'inproceedings': 184729, 'proceedings': 0, 'abstract': 601, 'inbook': 9700} no editor {'inproceedings': 604678, 'proceedings': 4847, 'abstract': 593657, 'inbook': 9788}

I looked at the 4847 proceedings records, there are all either with no author, or having bibcodes of type ['2024SPIE13176E....Y', 'proceedings'] ['2024SPIE13174E....G', 'proceedings'] ['2024SPIE13169E....C', 'proceedings'] ['2024SPIE13164E....N', 'proceedings'] ['2024SPIE13162E....L', 'proceedings']

and that can easily be fixed from adspy here https://github.com/adsabs/adspy/blob/master/ADSCachedExports.py#L934C57-L934C81 As you explained to me when you gave me the code, bibcodes of this format bibcode[14:18] != '....' are used to populate the editor for other records in the volume. I think at this point in adspy it should be decided if the editor field for these records should be set to be equal to the author list.

I will check the other types when the list is complete to see the records with missing editors are correct or not.

golnazads commented 4 weeks ago

Just checked one inbook record with no editor ['2014amsp.book....1M', 'inbook'] but there is 2014amsp.book.....M that should have been used as editor for 2014amsp.book....1M and 5 other records.

So even if I put the logic in export the display authors instead of editors for proceedings, there are still records that should have editors and they do not.

So again, I am advocating in checking the results I am producing today, to see how adspy logic should be updated to account for these missing editors.

@aaccomazzi

aaccomazzi commented 3 weeks ago

The reason for the missing editor field in 2014amsp.book....1M seems to be due to the fact that we don't have a TOC link for it (see https://github.com/adsabs/adspy/blame/master/ADSCachedExports.py#L1978). This is a somewhat fragile way to decide whether or not we should look for the editor in a book entry. I would advocate using the doctype information to decide if an editor field should be created or not.

As far as the "book entries" are concerned (e.g. 2014amsp.book.....M), I don't necessarily oppose creating an editor field, although we have always used the editor info in the author field since the expectation is that an author search will turn up the monographs that somebody edited. But just because we haven't done it this way before it doesn't mean we shouldn't to it still. Would like to hear opinions from the curation team on this one: @donnat @csgrant00 @ehenneken

Finally, we seem to have incorrect editor metadata for the book entry in question (2014amsp.book.....M) if you compare what the publisher page has (https://www.worldscientific.com/worldscibooks/10.1142/8851#t=aboutBook). @csgrant00 can you please check this out?

golnazads commented 3 weeks ago

Here are the results: has editor: {'inproceedings': 867124, 'proceedings': 15, 'abstract': 5779, 'inbook': 51008} no editor: {'inproceedings': 971738, 'proceedings': 58167, 'abstract': 1323120, 'inbook': 58461} total: 3335412

I checked 2000 each from doctypes of inproceedings and abstract with no editor list to see if there is a counterpart bibcode with [14:18] != '....' match that can be used to extract the editor list from, and for these 4000, there were none. So I am concluding that more than likely, these are all correct and they do not have editors.

For the inbook with missing editors, I checked all 58461 records, and of these, 33228 have bibcodes with [14:18] != '....' match with an author list that can be used to assign as an editor to these. Let me know if you want the list of bibcodes and the matched bibcode to get the editor list out of.

For proceedings with missing editors, out of the 58167 records, there are authors for 54503. I am not sure if for all of them, the authors can be considered to be editors. My guess is that if the author list from any of these records is assigned as an editor for other records, then these records should have their authors as editors as well; otherwise, no!

I hope that I was convincing enough so that any modification that needs to be made is going to be made from the adspy side for this issue.