Humans-of-Julia / BibParser.jl

Parser for bibliographic formats, including BibTeX, in pure Julia
MIT License
13 stars 4 forks source link

BibParser copies missing fields from previous entries #28

Open adrian-tymorek opened 2 years ago

adrian-tymorek commented 2 years ago

I think there is a strange behavior in case some entries have missing fields. That is, suppose that entries number 1 and 4 have abstract field, but in entries 2 and 3 it's missing. In that case abstract field is added to entries 2 and 3 and it's value is copied from the first entry. I've noticed that behavior for abstract, comment and x-color fields, but I guess it may be true for other fields too.

Here is a minimal example:

@article{ashfahani_2019_continual_DL,
    abstract = { The feasibility of deep neural networks (DNNs) to address
                  data stream problems still requires intensive study because of
                  the static and offline nature of conventional deep learning
                  approaches. A deep continual learning algorithm, namely
                  autonomous deep learning (ADL), is proposed in this paper.
                  Unlike traditional deep learning methods, ADL features a
                  flexible structure where its network structure can be
                  constructed from scratch with the absence of an initial
                  network structure via the self-constructing network structure.
                  ADL specifically addresses catastrophic forgetting by having a
                  different-depth structure which is capable of achieving a
                  trade-off between plasticity and stability. Network
                  significance (NS) formula is proposed to drive the hidden
                  nodes growing and pruning mechanism. Drift detection scenario
                  (DDS) is put forward to signal distributional changes in data
                  streams which induce the creation of a new hidden layer. The
                  maximum information compression index (MICI) method plays an
                  important role as a complexity reduction module eliminating
                  redundant layers. The efficacy of ADL is numerically validated
                  under the prequential test-then-train procedure in lifelong
                  environments using nine popular data stream problems. The
                  numerical results demonstrate that ADL consistently
                  outperforms recent continual learning methods while
                  characterizing the automatic construction of network
                  structures. },
    archiveprefix = {arXiv},
    author = {Andri Ashfahani and Mahardhika Pratama},
    comment = {published = 2018-10-17T01:40:45Z, updated = 2020-01-09T12:19:19Z},
    doi = {10.1137/1.9781611975673.75},
    eprint = {1810.07348v4},
    month = jan,
    primaryclass = {cs.LG},
    title = {Autonomous Deep Learning: Continual Learning Approach for Dynamic Environments},
    url = {http://arxiv.org/abs/1810.07348v4; http://arxiv.org/pdf/1810.07348v4},
    x-color = {#cc3300},
    x-fetchedfrom = {arXiv.org},
    year = 2019
}

@article{ashfahani_2020_DEVDAN,
    added-at = {2020-05-08T00:00:00.000+0200},
    author = {Andri Ashfahani and Mahardhika Pratama and Edwin Lughofer and Yew-Soon Ong},
    biburl = {https://www.bibsonomy.org/bibtex/2f01e837afa1ecc4df48befc53e43f458/dblp},
    ee = {https://doi.org/10.1016/j.neucom.2019.07.106},
    interhash = {d8ce7807e54d80e379324b2c3b4cd6df},
    intrahash = {f01e837afa1ecc4df48befc53e43f458},
    journal = {Neurocomputing},
    pages = {297--314},
    timestamp = {2020-05-09T11:39:11.000+0200},
    title = {DEVDAN: Deep evolving denoising autoencoder.},
    url = {http://dblp.uni-trier.de/db/journals/ijon/ijon390.html#AshfahaniPLO20},
    volume = 390,
    x-fetchedfrom = {Bibsonomy},
    year = 2020
}
Azzaare commented 2 years ago

Thanks! I will have a look as soon as possible.

jagot commented 4 weeks ago

I was bitten by this bug as well, and it seems to be case-sensitive. Consider the following example:

@book{Grant2007,
  Address =      {New York},
  Author =       {Grant, Ian P.},
  Isbn =         {978-0-387-34671-7},
  Publisher =    {Springer},
  Title =        {Relativistic quantum theory of atoms and molecules:
                  {T}heory and computation},
  Year =         2007
}

@article{Stone2005,
  author =       {N.J. Stone},
  title =        {Table of Nuclear Magnetic Dipole and Electric
                  Quadrupole Moments},
  journal =      {Atomic Data and Nuclear Data Tables},
  volume =       90,
  number =       1,
  pages =        {75-176},
  year =         2005,
  doi =          {10.1016/j.adt.2005.04.001},
  url =          {http://dx.doi.org/10.1016/j.adt.2005.04.001},
}

@book{FroeseFischer1997,
  Address =      {Bristol, UK Philadelphia, Penn},
  Author =       {Froese Fischer, Charlotte and Brage, Tomas and
                  Jönsson, Per},
  Isbn =         {0-7503-0466-9},
  Publisher =    {Institute of Physics Publ},
  Title =        {Computational atomic structure : an {MCHF} approach},
  Year =         1997
}

@article{Javanainen1988,
  author =       {J. Javanainen and J. H. Eberly and Qichang Su},
  title =        {Numerical Simulations of Multiphoton Ionization and
                  Above-Threshold Electron Spectra},
  journal =      {Physical Review A},
  volume =       38,
  number =       7,
  pages =        {3430-3446},
  year =         1988,
  doi =          {10.1103/physreva.38.3430},
  url =          {http://dx.doi.org/10.1103/PhysRevA.38.3430},
}

Note that Author and author are differently cases. Running BibParser.parse_file on this input gives

julia> BibParser.parse_file("docs/src/debug.bib")
OrderedCollections.OrderedDict{String, BibInternal.Entry} with 4 entries:
  "Grant2007"         => Entry(Access("", "", ""), Name[Name("", "Grant", "", "Ian", "P.")], "", Date("", "", "2007"), Name[], Eprint("", "", ""), "Grant2007", In("New York", "", "", "", "", "", "", "", "Springer", "", "", ""), Dict("isbn"=>"978-0-387-34671-7"), "Relativistic quantum theory of atoms and molecules:\n…
  "Stone2005"         => Entry(Access("10.1016/j.adt.2005.04.001", "", "http://dx.doi.org/10.1016/j.adt.2005.04.001"), Name[Name("", "Grant", "", "Ian", "P.")], "", Date("", "", "2007"), Name[], Eprint("", "", ""), "Stone2005", In("New York", "", "", "", "Atomic Data and Nuclear Data Tables", "1", "", "75-176", "Spr…
  "FroeseFischer1997" => Entry(Access("", "", ""), Name[Name("", "Froese Fischer", "", "Charlotte", ""), Name("", "Brage", "", "Tomas", ""), Name("", "Jönsson", "", "Per", "")], "", Date("", "", "1997"), Name[], Eprint("", "", ""), "FroeseFischer1997", In("Bristol, UK Philadelphia, Penn", "", "", "", "", "", "", "",…
  "Javanainen1988"    => Entry(Access("10.1103/physreva.38.3430", "", "http://dx.doi.org/10.1103/PhysRevA.38.3430"), Name[Name("", "Froese Fischer", "", "Charlotte", ""), Name("", "Brage", "", "Tomas", ""), Name("", "Jönsson", "", "Per", "")], "", Date("", "", "1997"), Name[], Eprint("", "", ""), "Javanainen1988", I…

whereas changing the order of entries, or ensuring consistent cases for all entries instead yields

OrderedCollections.OrderedDict{String, BibInternal.Entry} with 4 entries:
  "Grant2007"         => Entry(Access("", "", ""), Name[Name("", "Grant", "", "Ian", "P.")], "", Date("", "", "2007"), Name[], Eprint("", "", ""), "Grant2007", In("New York", "", "", "", "", "", "", "", "Springer", "", "", ""), Dict("isbn"=>"978-0-387-34671-7"), "Relativistic quantum theory of atoms and molecules:\n…
  "Stone2005"         => Entry(Access("10.1016/j.adt.2005.04.001", "", "http://dx.doi.org/10.1016/j.adt.2005.04.001"), Name[Name("", "Stone", "", "N.J.", "")], "", Date("", "", "2005"), Name[], Eprint("", "", ""), "Stone2005", In("New York", "", "", "", "Atomic Data and Nuclear Data Tables", "1", "", "75-176", "Spri…
  "FroeseFischer1997" => Entry(Access("", "", ""), Name[Name("", "Froese Fischer", "", "Charlotte", ""), Name("", "Brage", "", "Tomas", ""), Name("", "Jönsson", "", "Per", "")], "", Date("", "", "1997"), Name[], Eprint("", "", ""), "FroeseFischer1997", In("Bristol, UK Philadelphia, Penn", "", "", "", "", "", "", "",…
  "Javanainen1988"    => Entry(Access("10.1103/physreva.38.3430", "", "http://dx.doi.org/10.1103/PhysRevA.38.3430"), Name[Name("", "Javanainen", "", "J.", ""), Name("", "Eberly", "", "J.", " H."), Name("", "Su", "", "Qichang", "")], "", Date("", "", "1988"), Name[], Eprint("", "", ""), "Javanainen1988", In("Bristol,…

as expected.