gnames / gnparser

GNparser normalises scientific names and extracts their semantic elements.
MIT License
38 stars 4 forks source link

panic runtime error with names containing ":" two or more times #189

Closed abubelinha closed 2 years ago

abubelinha commented 3 years ago

Hello again. I am testing gnparser with some difficult names, and I found different behaviour depending on the input way (Windows 7):

c:\>C:\gnparser-v1.3.3-win-64\gnparser -f pretty "Plantago paludosa FIORI, A. - Nuova flora analitica d'ltalia (ed. 2). S.n., Firenze, 1923-1929. 1/1: [1]-160. 1923 (mars); 1/2: 161-320. 1923 (juil.); 1/3: 321-480. 1923 (déc.);"

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x28 pc=0x9332b9]

goroutine 1 [running]:
github.com/gnames/gnsys.FileExists(0xc000118000, 0xb2, 0xc350, 0x0, 0x1f90)
        /home/dimus/go/pkg/mod/github.com/gnames/gnsys@v0.1.1/gnsys.go:33 +0x99
github.com/gnames/gnparser/gnparser/cmd.parse(0xc000118000, 0xb2, 0x2, 0x4, 0xc350, 0x0, 0x1f90, 0x0, 0x0)
        /home/dimus/code/golang/gnparser/gnparser/cmd/root.go:206 +0x85
github.com/gnames/gnparser/gnparser/cmd.glob..func1(0xe3ba00, 0xc0000aa000, 0x1, 0x3)
        /home/dimus/code/golang/gnparser/gnparser/cmd/root.go:102 +0x245
github.com/spf13/cobra.(*Command).execute(0xe3ba00, 0xc00004a050, 0x3, 0x3, 0xe3ba00, 0xc00004a050)
        /home/dimus/go/pkg/mod/github.com/spf13/cobra@v1.1.1/command.go:854 +0x2c2
github.com/spf13/cobra.(*Command).ExecuteC(0xe3ba00, 0xe46be0, 0x0, 0xc000039f78)
        /home/dimus/go/pkg/mod/github.com/spf13/cobra@v1.1.1/command.go:958 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
        /home/dimus/go/pkg/mod/github.com/spf13/cobra@v1.1.1/command.go:895
github.com/gnames/gnparser/gnparser/cmd.Execute()
        /home/dimus/code/golang/gnparser/gnparser/cmd/root.go:110 +0x38
main.main()
        /home/dimus/code/golang/gnparser/gnparser/main.go:26 +0x27

At a first glance, I thought there could be a limit in the length of the names which I can parse this way. But oddly, when I added some random words at the end of the name to make it longer, it was correctly parsed by gnparser (of course, warning about the long tail ... but without raising any errors).

Please compare (same name, but with a long right-added random string):

c:\>C:\gnparser-v1.3.3-win-64\gnparser -f pretty  "Plantago paludosa FIORI, A. - Nuova flora analitica d'ltalia (ed. 2). S.n., Firenze, 1923-1929. 1/1: [1]-160. 1923 (mars); 1/2: 161-320. 1923 (juil.); 1/3: 321-480. 1923 (déc.); bombom chekevara bebechura conga bombom chekevara bebechura conga bombom chekevara bebechura conga"
{
  "parsed": true,
  "quality": 4,
  "qualityWarnings": [
    {
      "quality": 4,
      "warning": "Unparsed tail"
    },
    {
      "quality": 2,
      "warning": "Author in upper case"
    }
  ],
  "verbatim": "Plantago paludosa FIORI, A. - Nuova flora analitica d'ltalia (ed. 2). S.n., Firenze, 1923-1929. 1/1: [1]-160. 1923 (mars); 1/2: 161-320. 1923 (ju
il.); 1/3: 321-480. 1923 (déc.); bombom chekevara bebechura conga bombom chekevara bebechura conga bombom chekevara bebechura conga",
  "normalized": "Plantago paludosa Fiori \u0026 A.",
  "canonical": {
    "stemmed": "Plantago paludos",
    "simple": "Plantago paludosa",
    "full": "Plantago paludosa"
  },
  "cardinality": 2,
  "authorship": {
    "verbatim": "FIORI, A.",
    "normalized": "Fiori \u0026 A.",
    "authors": [
      "Fiori",
      "A."
    ]
  },
  "tail": " - Nuova flora analitica d'ltalia (ed. 2). S.n., Firenze, 1923-1929. 1/1: [1]-160. 1923 (mars); 1/2: 161-320. 1923 (juil.); 1/3: 321-480. 1923 (déc.)
; bombom chekevara bebechura conga bombom chekevara bebechura conga bombom chekevara bebechura conga",
  "id": "e0c79987-32e7-5c43-9f07-1aee82442af6",
  "parserVersion": "v1.3.3"
}

I found the same problem in several names (source):

Androsace imbricata CADEVALL I DIARS, J. & P. FONT I QUER - 9. - Flora de Catalunya. Institut d'Estudis Catalans, Barcelona, 1934-1937. 4: [viii], [1]-481. "1932" [1934]; 5: [1]-45 Plantago altissima JAHANDIEZ, E. & R. MAIRE - Catalogue des plantes du Maroc. Minerva, Lechevalier, Alger, 1931-1934. 1: [I]-XL, [1]-[160]. 1931; 2: [vi], [161]-[558]. 1932; 3: [L Plantago paludosa FIORI, A. - Nuova flora analitica d'ltalia (ed. 2). S.n., Firenze, 1923-1929. 1/1: [1]-160. 1923 (mars); 1/2: 161-320. 1923 (juil.); 1/3: 321-480.1923 (déc.);

I guess it has something to do with names containing at least 2 repetitions of the ":" character, AND less than 230 characters. (the problem does not happen if the name length goes beyond 230 characters).

dimus commented 2 years ago
github.com/gnames/gnsys.FileExists(0xc000118000, 0xb2, 0xc350, 0x0, 0x1f90)
        /home/dimus/go/pkg/mod/github.com/gnames/gnsys@v0.1.1/gnsys.go:33 +0x99

Hm, looks like it is a Windows file system problem, may be a Go bug? On Linux I get the following:

✦ ❯ gnparser tmp.txt
Id,Verbatim,Cardinality,CanonicalStem,CanonicalSimple,CanonicalFull,Authorship,Year,Quality
024bbd31-47f2-57e1-ba72-6b30381974ca,"Androsace imbricata CADEVALL I DIARS, J. & P. FONT I QUER - 9. - Flora de Catalunya. Institut d'Estudis Catalans, Barcelona, 1934-1937. 4: [viii], [1]-481. ""1932"" [1934]; 5: [1]-45",2,Androsace imbricat,Androsace imbricata,Androsace imbricata,"Cadevall I Diars, J. & P. Font I Quer",,4
5ccfcb2c-4ba6-57a6-a2cc-f7949f3dfafb,"Plantago altissima JAHANDIEZ, E. & R. MAIRE - Catalogue des plantes du Maroc. Minerva, Lechevalier, Alger, 1931-1934. 1: [I]-XL, [1]-[160]. 1931; 2: [vi], [161]-[558]. 1932; 3: [L",2,Plantago altissim,Plantago altissima,Plantago altissima,"Jahandiez, E. & R. Maire",,4
646aeda0-ec31-5c55-a0f0-7bc56b10ae76,"Plantago paludosa FIORI, A. - Nuova flora analitica d'ltalia (ed. 2). S.n., Firenze, 1923-1929. 1/1: [1]-160. 1923 (mars); 1/2: 161-320. 1923 (juil.); 1/3: 321-480.1923 (déc.);",2,Plantago paludos,Plantago paludosa,Plantago paludosa,Fiori & A.,,4
a9456e61-bd30-53bc-8588-accb913cc64a,,0,,,,,,0

I will update Go to 1.17, lets see if the problem persists.

dimus commented 2 years ago

@abubelinha, can you try v1.4.0 and see if this problem persists for you on Windows 7?

dimus commented 2 years ago

I tried on Windows 10:

PS C:\Users\dmozz\tmp> gnparser.exe -V

version: v1.4.0

build:   2021-09-04_13:17:01UTC

PS C:\Users\dmozz\tmp> gnparser.exe .\names.txt
Id,Verbatim,Cardinality,CanonicalStem,CanonicalSimple,CanonicalFull,Authorship,Year,Quality
024bbd31-47f2-57e1-ba72-6b30381974ca,"Androsace imbricata CADEVALL I DIARS, J. & P. FONT I QUER - 9. - Flora de Catalunya. Institut d'Estudis Catalans, Barcelona, 1934-1937. 4: [viii], [1]-481. ""1932"" [1934]; 5: [1]-45",2,Androsace imbricat,Androsace imbricata,Androsace imbricata,"Cadevall I Diars, J. & P. Font I Quer",,4
5ccfcb2c-4ba6-57a6-a2cc-f7949f3dfafb,"Plantago altissima JAHANDIEZ, E. & R. MAIRE - Catalogue des plantes du Maroc. Minerva, Lechevalier, Alger, 1931-1934. 1: [I]-XL, [1]-[160]. 1931; 2: [vi], [161]-[558]. 1932; 3: [L",2,Plantago altissim,Plantago altissima,Plantago altissima,"Jahandiez, E. & R. Maire",,4
646aeda0-ec31-5c55-a0f0-7bc56b10ae76,"Plantago paludosa FIORI, A. - Nuova flora analitica d'ltalia (ed. 2). S.n., Firenze, 1923-1929. 1/1: [1]-160. 1923 (mars); 1/2: 161-320. 1923 (juil.); 1/3: 321-480.1923 (déc.);",2,Plantago paludos,Plantago paludosa,Plantago paludosa,Fiori & A.,,4

PS C:\Users\dmozz\tmp> gnparser.exe "Androsace imbricata CADEVALL I DIARS, J. & P. FONT I QUER - 9. - Flora de Catalunya. Institut d'Estudis Catalans, Barcelona, 1934-1937. 4: [viii], [1]-481. 1932 [1934]; 5: [1]-45"
Id,Verbatim,Cardinality,CanonicalStem,CanonicalSimple,CanonicalFull,Authorship,Year,Quality
9b1de195-f594-58ff-9519-6b49b8d4890f,"Androsace imbricata CADEVALL I DIARS, J. & P. FONT I QUER - 9. - Flora de Catalunya. Institut d'Estudis Catalans, Barcelona, 1934-1937. 4: [viii], [1]-481. 1932 [1934]; 5: [1]-45",2,Androsace imbricat,Androsace imbricata,Androsace imbricata,"Cadevall I Diars, J. & P. Font I Quer",,4
dimus commented 2 years ago

Did not get further feedback, closing for now

abubelinha commented 2 years ago

sorry for the long time to answer

Yes, I downloaded v1.4.0 and it worked properly on Windows 7

Thanks so much!

dimus commented 2 years ago

Great, thank you for the feedback