fnl / medic

a Python 3 command-line tool to maintain a DB mirror of MEDLINE (https://pypi.python.org/pypi/medic) - ALERT: As I have moved out of science and am working as a consultant now, this project might need a new maintainer once PubMed changes its XML format. Heroes?
GNU General Public License v3.0
25 stars 16 forks source link

Warnings while creating SQLite database #6

Open breisfeld opened 6 years ago

breisfeld commented 6 years ago

Hi,

Your program medic looks extremely useful.

I am trying to create a medline sql database using files downloaded from ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline.

After the download, I issue the following shell command:

$ for file in baseline/medline17n*.xml.gz
do
  medic --url sqlite:///medline.db update $file
done

For each file, I get the following database table-related warnings:

K:\Python\Anaconda\lib\site-packages\sqlalchemy\orm\relationships.py:2694: SAWarning: relationship 'Qualifier.citation' will copy column citations.pmid to column qualifiers.pmid, which conflicts with relationship(s): 'Qualifier.descriptor' (copies descriptors.pmid to qualifiers.pmid), 'Descriptor.qualifiers' (copies descriptors.pmid to qualifiers.pmid). Consider applying viewonly=True to read-only relationships, or provide a primaryjoin condition marking writable columns with the foreign() annotation.
  for (pr, fr_) in other_props)
K:\Python\Anaconda\lib\site-packages\sqlalchemy\orm\relationships.py:2694: SAWarning: relationship 'Citation.qualifiers' will copy column citations.pmid to column qualifiers.pmid, which conflicts with relationship(s): 'Qualifier.descriptor' (copies descriptors.pmid to qualifiers.pmid), 'Descriptor.qualifiers' (copies descriptors.pmid to qualifiers.pmid). Consider applying viewonly=True to read-only relationships, or provide a primaryjoin condition marking writable columns with the foreign() annotation.
  for (pr, fr_) in other_props)
K:\Python\Anaconda\lib\site-packages\sqlalchemy\orm\relationships.py:2694: SAWarning: relationship 'Section.citation' will copy column citations.pmid to column sections.pmid, which conflicts with relationship(s): 'Section.abstract' (copies abstracts.pmid to sections.pmid), 'Abstract.sections' (copies abstracts.pmid to sections.pmid). Consider applying viewonly=True to read-only relationships, or provide a primaryjoin condition marking writable columns with the foreign() annotation.
  for (pr, fr_) in other_props)
K:\Python\Anaconda\lib\site-packages\sqlalchemy\orm\relationships.py:2694: SAWarning: relationship 'Citation.sections' will copy column citations.pmid to column sections.pmid, which conflicts with relationship(s): 'Section.abstract' (copies abstracts.pmid to sections.pmid), 'Abstract.sections' (copies abstracts.pmid to sections.pmid). Consider applying viewonly=True to read-only relationships, or provide a primaryjoin condition marking writable columns with the foreign() annotation.
  for (pr, fr_) in other_props)

Platform: Windows 7 Medic: 2.4.1 Python: Python 3.6.3 |Anaconda custom (64-bit)|

fnl commented 6 years ago

And you did the setup step to "create" your database, as recommended in Setup? (medic insert --url sqlite:///medline.db 123456)?

fnl commented 6 years ago

Forgot to add, as per the Setup instructions: And you created the tables with medic --url sqlite:///medline.db write 123?

breisfeld commented 6 years ago
$ medic insert --url sqlite:///medline.db 123456```

K:\Python\Anaconda\lib\site-packages\sqlalchemy\orm\relationships.py:2694: SAWarning: relationship 'Qualifier.citation' will copy column citations.pmid to column qualifiers.pmid, which conflicts with relationship(s): 'Qualifier.descriptor' (copies descriptors.pmid to qualifiers.pmid), 'Descriptor.qualifiers' (copies descriptors.pmid to qualifiers.pmid). Consider applying viewonly=True to read-only relationships, or provide a primaryjoin condition marking writable columns with the foreign() annotation.
  for (pr, fr_) in other_props)
K:\Python\Anaconda\lib\site-packages\sqlalchemy\orm\relationships.py:2694: SAWarning: relationship 'Citation.qualifiers' will copy column citations.pmid to column qualifiers.pmid, which conflicts with relationship(s): 'Qualifier.descriptor' (copies descriptors.pmid to qualifiers.pmid), 'Descriptor.qualifiers' (copies descriptors.pmid to qualifiers.pmid). Consider applying viewonly=True to read-only relationships, or provide a primaryjoin condition marking writable columns with the foreign() annotation.
  for (pr, fr_) in other_props)
K:\Python\Anaconda\lib\site-packages\sqlalchemy\orm\relationships.py:2694: SAWarning: relationship 'Section.citation' will copy column citations.pmid to column sections.pmid, which conflicts with relationship(s): 'Section.abstract' (copies abstracts.pmid to sections.pmid), 'Abstract.sections' (copies abstracts.pmid to sections.pmid). Consider applying viewonly=True to read-only relationships, or provide a primaryjoin condition marking writable columns with the foreign() annotation.
  for (pr, fr_) in other_props)
K:\Python\Anaconda\lib\site-packages\sqlalchemy\orm\relationships.py:2694: SAWarning: relationship 'Citation.sections' will copy column citations.pmid to column sections.pmid, which conflicts with relationship(s): 'Section.abstract' (copies abstracts.pmid to sections.pmid), 'Abstract.sections' (copies abstracts.pmid to sections.pmid). Consider applying viewonly=True to read-only relationships, or provide a primaryjoin condition marking writable columns with the foreign() annotation.
  for (pr, fr_) in other_props)
2017-11-27 14:37:24,007 medic.parser CRITICAL: error while parsing PMID 123456
Traceback (most recent call last):
  File "K:/Python/Anaconda/Scripts/medic", line 345, in <module>
    result = Main(args.command, args.files, Session(), not args.all)
  File "K:/Python/Anaconda/Scripts/medic", line 36, in Main
    return insert(session, files_or_pmids, unique)
  File "K:\Python\Anaconda\lib\site-packages\medic\crud.py", line 32, in insert
    _add(session, files_or_pmids, lambda i: session.add(i), uniq)
  File "K:\Python\Anaconda\lib\site-packages\medic\crud.py", line 178, in _add
    count += _downloadAll(session, dbHandle, pmids, unique)
  File "K:\Python\Anaconda\lib\site-packages\medic\crud.py", line 289, in _downloadAll
    return sum(map(streaming, chain(instances)))
  File "K:\Python\Anaconda\lib\site-packages\medic\crud.py", line 211, in _streamInstances
    for citation in _collectCitation(stream):
  File "K:\Python\Anaconda\lib\site-packages\medic\crud.py", line 230, in _collectCitation
    for instance in stream:
  File "K:\Python\Anaconda\lib\site-packages\medic\parser.py", line 81, in parse
    for instance in self.yieldInstances(element):
  File "K:\Python\Anaconda\lib\site-packages\medic\parser.py", line 108, in yieldInstances
    for i in self.yieldFromGenerator(element):
  File "K:\Python\Anaconda\lib\site-packages\medic\parser.py", line 116, in yieldFromGenerator
    instance = getattr(self, element.tag)(element)
  File "K:\Python\Anaconda\lib\site-packages\medic\parser.py", line 490, in MedlineCitation
    return Parser.MedlineCitation(self, element)
  File "K:\Python\Anaconda\lib\site-packages\medic\parser.py", line 146, in MedlineCitation
    created = options['created']
KeyError: 'created'
$ medic --url sqlite:///medline.db write 123

K:\Python\Anaconda\lib\site-packages\sqlalchemy\orm\relationships.py:2694: SAWarning: relationship 'Qualifier.citation' will copy column citations.pmid to column qualifiers.pmid, which conflicts with relationship(s): 'Qualifier.descriptor' (copies descriptors.pmid to qualifiers.pmid), 'Descriptor.qualifiers' (copies descriptors.pmid to qualifiers.pmid). Consider applying viewonly=True to read-only relationships, or provide a primaryjoin condition marking writable columns with the foreign() annotation.
  for (pr, fr_) in other_props)
K:\Python\Anaconda\lib\site-packages\sqlalchemy\orm\relationships.py:2694: SAWarning: relationship 'Citation.qualifiers' will copy column citations.pmid to column qualifiers.pmid, which conflicts with relationship(s): 'Qualifier.descriptor' (copies descriptors.pmid to qualifiers.pmid), 'Descriptor.qualifiers' (copies descriptors.pmid to qualifiers.pmid). Consider applying viewonly=True to read-only relationships, or provide a primaryjoin condition marking writable columns with the foreign() annotation.
  for (pr, fr_) in other_props)
K:\Python\Anaconda\lib\site-packages\sqlalchemy\orm\relationships.py:2694: SAWarning: relationship 'Section.citation' will copy column citations.pmid to column sections.pmid, which conflicts with relationship(s): 'Section.abstract' (copies abstracts.pmid to sections.pmid), 'Abstract.sections' (copies abstracts.pmid to sections.pmid). Consider applying viewonly=True to read-only relationships, or provide a primaryjoin condition marking writable columns with the foreign() annotation.
  for (pr, fr_) in other_props)
K:\Python\Anaconda\lib\site-packages\sqlalchemy\orm\relationships.py:2694: SAWarning: relationship 'Citation.sections' will copy column citations.pmid to column sections.pmid, which conflicts with relationship(s): 'Section.abstract' (copies abstracts.pmid to sections.pmid), 'Abstract.sections' (copies abstracts.pmid to sections.pmid). Consider applying viewonly=True to read-only relationships, or provide a primaryjoin condition marking writable columns with the foreign() annotation.
  for (pr, fr_) in other_props)
fnl commented 6 years ago

Seems very peculiar indeed, then. When I have some time to spare, I can try checking if medic still works with OSX and/or Linux. However, note, I have no access to Windows machines.

breisfeld commented 6 years ago

I tested on OS X (High Sierra, 10.13.1 (17B48)) and get the same warnings.

I wonder if this has to do with differences in the versions of sqlalchemy we are using.

Here is info for my system.

Python 3.6.3 | packaged by conda-forge | (default, Nov  4 2017, 10:13:32)
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlalchemy
>>> sqlalchemy.__version__
'1.1.13'
fnl commented 6 years ago

Could totally be the case; When I developed and worked with this tool, it still was version 0.8 (see the setup.py).

fnl commented 6 years ago

Confirmed. Sadly that means something in SQLAlchemy has changed sufficiently to break backwards compatibility. As I don't have enough time to spend on fixing that level of detail, I guess that has put medic out of service. PRs that fix this welcome, while I am sorry to say that I have no ETA on when I can get this fixed.

fnl commented 6 years ago

If it helps, for anyone looking into this, I just tried the various SQLAlchemy versions, including 0.8 that was around when I developed this tool, and none of them fixes the problem. So I guess it's some change to SQLite or the Python API of the same.

breisfeld commented 6 years ago

I greatly appreciate all of the time you already put into this software and understand your time constraints. I am not a database person, so I don't know if the warnings are pointing to anything significant or just what the current sqlalchemy considers poor practices. I may try to dig into this if I can find some time.

fnl commented 6 years ago

Well, as I tried building medic in environment with all SQLAlchemy versions back to 0.8, and no-one works, I have my doubts that is a SQLAlchemy problem at all. Because it seems weird why this issue would pop up now, but never popped before using the same earlier versions (I think I was using, e.g., 0.9, pretty nicely. But yes, I could be wrong on that count, e.g., if medic was working due to some bug in SQLAlchemy that got patched in later versions of the 0.8 release.

fnl commented 6 years ago

BTW, if I'm correct, medic still should work with PostrgreSQL.

breisfeld commented 6 years ago

That is odd. It seems like the messages are emitted by SQLAlchemy, but perhaps those are a side effect of changes to sqlite/pysqlite.

byeongchul commented 4 years ago

I faced a similar problem for baseline 2020 data, and found there is no DateCreated terms in current pubmed DTD.

$ medic --url sqlite:///tmp.db insert 123456 

...

 in MedlineCitation
    return Parser.MedlineCitation(self, element)
  File "/home/bckang/.conda/envs/nlp/lib/python3.7/site-packages/medic/parser.py", line 146, in MedlineCitation
    created = options['created']
KeyError: 'created'
byeongchul commented 4 years ago

As my understanding, <DataCreated> element was removed in 2018 (https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html)

image