Telescope workflow implementation: Ebsco

aroelo commented 4 years ago

For which publishers will we extract this data?
The oldest report in the mail is January 2021, from which date do we want to collect EBSCO data?
Is the email send automatically?
- Is the title of the mail typed manually or generated automatically?
- Will the mail always be from the same sender?
- Which date each month is the report send, is this variable or always the same date?
- The mail with title EBSCO eBook Usage - April 2021 has the attachment University of Michigan Press eBook Usage Monthly March 2021.xlsx, the months conflict. There seems to be no date inside the report, do we use the filename or mail title as ground truth? The file name might not be an option, because it seems they are not automatically generated (see below)
Is the report itself generated automatically?
- The filenames of the report differ: University of Michigan Press_eBook Usage_Jan21.xlsx, University of Michigan Press EBSCO eBook Usage_Feb21.xlsx and University of Michigan Press eBook Usage Monthly March 2021
- In the Data Trust google doc there is a field Month_of_Log_Month in the EBSCO schema, but this is only available in the January report.
Do any of the fields in the CSV file contain multiple values, which ones and what is the delimiter? (e.g. maybe the 'Subjects' field, with '/' as a delimiter)
What is the 'Retrieval Count' exactly? E.g. is this the number of downloads per unique IP address or can one IP address account for multiple downloads. Is it the downloads per chapter, aggregated to a single book or downloads per whole book, etc.
What is the difference between 'Imprint Publisher' and 'Contract Publisher', they are the same for UMP, but in general what is the difference between these two terms?

aroelo commented 3 years ago

@alkimozaygen Could you have a look at the questions in the description of this issue?

alkimozaygen commented 3 years ago

@aroelo

For which publishers will we extract this data? University of Michigan Press and Wits University Press. However, Wits University Press titles are not hosted on EBSCO. EBSCO is probably forwarding to one of the dissemination platform used by the Wits University Press. I will try to learn more about this.
The oldest report in the mail is January 2021, from which date do we want to collect EBSCO data? I will ask both partners to send their older reports starting from January 2020.
Do any of the fields in the CSV file contain multiple values, which ones and what is the delimiter? (e.g. maybe the 'Subjects' field, with '/' as a delimiter) It seems '/' is only used in the 'Subjects' field. The subjects are going from the higher level to lower subject level (i.e a title having a subject field as 'HISTORY / United States / 20th Century' is classified in the History subject at a higher level, but at a lower level it is about the history of the United States in the 20th Century).
What is the difference between 'Imprint Publisher' and 'Contract Publisher', they are the same for UMP, but in general what is the difference between these two terms? The contract publisher is the legal name of the publisher, and the imprint publisher is the trade name of the publisher (you can think of the contract publisher as the parent company). For example, the contract publisher can be 'Springer Nature' and the imprint publisher can be 'Palgrave MacMillan'.

I will try to learn about your other questions and let you know.

alkimozaygen commented 3 years ago

@Aniek I forwarded your questions to the University of Michigan Press. Below are Charles Watkinson's answers.

Is the email send automatically? It is sent manually by Thomas Smith.
Will the mail always be from the same sender? Currently, yes. But only while he remains in the role (for example, it was previously sent by a different EBSCO employee, Ann Droppers.
Which date each month is the report send, is this variable or always the same date? The date has been variable. It seems to be when Thomas Smith gets to it.
The mail with title EBSCO eBook Usage - April 2021 has the attachment University of Michigan Press eBook Usage Monthly March 2021.xlsx, the months conflict. There seems to be no date inside the report, do we use the filename or mail title as ground truth? The file name might not be an option, because it seems they are not automatically generated (see below) I think the mail title is the most consistent, but the fact that neither the mail title nor the file appear to be automatically generated means that there may be some variation.
Is the report itself generated automatically? It appears to be pulled as needed from a central database. I am not clear on EBSCO's internal processes.
The filenames of the report differ: University of Michigan Press_eBook Usage_Jan21.xlsx, University of Michigan Press EBSCO eBook Usage_Feb21.xlsx and University of Michigan Press eBook Usage Monthly March 2021 Yes. The email subject lines seem more consistent.
In the Data Trust google doc there is a field Month_of_Log_Month in the EBSCO schema, but this is only available in the January report. Yes. The problem of consistency emerges again. This is not automatically generated.
What is the 'Retrieval Count' exactly? E.g. is this the number of downloads per unique IP address or can one IP address account for multiple downloads. Is it the downloads per chapter, aggregated to a single book or downloads per whole book, etc. I asked for a definition. "Retrieval Count is comprised of a sum total of the following activities. It is not equivalent to any COUNTER 5 report fields: View, Print, Email, Save, Export, Download, Sample." EBSCO presents titles at the book, not chapter level.
Does EBSCO provide a dashboard or a platform like JSTOR or Google Books where you can create scheduled reports? No

rhosking commented 3 years ago

Thanks Alkim. It sounds like for this to be a viable option for developing a telescope, we need a better understanding of what those internal processes are. Might it be an option, possibly via Charles, to ask EBESCO what options they have for providing something more routine and consistent? and ideally directly from EBESCO to our account

aroelo commented 3 years ago

Thanks for those quick answers @alkimozaygen . I agree with @rhosking it would be really helpful if we could get something more automatically generated.

We could possibly work around the fact that the email is not send automatically by manually checking the mail ourselves, downloading the report and putting it on the SFTP server.

This would still leave an issue though if the report itself is not generated automatically either. The most concerning I think would be that one of the fields is only available in the January report. If this is a sign that fields randomly may or may not be in the report, it will be hard to work with the data.

The-Academic-Observatory / oaebu-workflows

Telescope workflow implementation: Ebsco #6