biblissima / collatinus

Sources of Collatinus software - Latin lemmatizer, morphological analyzer and scansion
http://outils.biblissima.fr/en/collatinus
GNU General Public License v3.0
62 stars 15 forks source link

Format of .col files #64

Closed nivaca closed 3 years ago

nivaca commented 3 years ago

Hi, I would like to decompress the .col files (the dictionaries), but I have not found the proper format they are in. Could you please help me with that? Thanks. Nicolas Vaughan

PhVerkerk commented 3 years ago

Dear Nicolas,

The .col files are in a strange binary format that contains strings compressed and decompressed using Qt qCompress and qUncompress functions. I can't tell you more about these functions, except that they are zip-like but not zip-compatible.

Why do you want/need to decompress the .col files ?

Yours,

    Philippe.

Le 01/04/2021 à 18:17, Nicolas Vaughan a écrit :

Hi, I would like to decompress the |.col| files (the dictionaries), but I have not found the proper format they are in. Could you please help me with that? Thanks. Nicolas Vaughan

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/biblissima/collatinus/issues/64, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNY23BO5BTKMPXWMAP3GT3TGSMDJANCNFSM42HMNO2Q.

nivaca commented 3 years ago

Dear Phillipe, Thanks for your reply. I would like to have direct access to the dictionary contents. Best wishes, Nicolas

ycollatin commented 3 years ago
Hi all,

Le 1 avril 2021, à 09h40, Nicolas Vaughan écrivit :

Thanks for your reply. I would like to have direct access to the dictionary contents. Best wishes,

A good manner to do that is to install Collatinus, and then to

pick the dictionaries you need in .../directory_where_is_collatinus/data/lemmes.* Cheers, -- Yves Ouvrard

nivaca commented 3 years ago

Hi Yves, Thanks for your suggestion. It worked fine.

Edit: However, I now get a some .cz files which are still compressed and hence inaccessible.

Best, Nicolas

PhVerkerk commented 3 years ago

I assume that you have the same problem with the .cz format (c for Collatinus and z for Zipped, but with Qt qCompress). The underlying string-format is HTML but the information is the same as the original TeX (for Gaffiot) or XML (for L&S) files, which are available in different places :

The recently added dictionaries are available on the web :

Yours,

    Philippe.

Le 01/04/2021 à 19:48, Nicolas Vaughan a écrit :

Reopened #64 https://github.com/biblissima/collatinus/issues/64.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biblissima/collatinus/issues/64#event-4542625875, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNY23AIGEDTANEKIU2F4D3TGSWWBANCNFSM42HMNO2Q.

nivaca commented 3 years ago

Dear Phillipe,

Thanks for your message. The XML version of the L&S was very interesting. I was also interested in the Ramminger, but I found no downloadable version in the web page you mentioned.

All best,

Nicolas

PhVerkerk commented 3 years ago

It's right : I have a collection of 21,115 files. I try to change my program to produce a readable HTML file.

Philippe.

Le 02/04/2021 à 15:38, Nicolas Vaughan a écrit :

Dear Phillipe,

Thanks for your message. The XML version of the L&S was very interesting. I was also interested in the Ramminger, but I found no downloadable version in the web page you mentioned.

All best,

Nicolas

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biblissima/collatinus/issues/64#issuecomment-812533831, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNY23H5YNXYKULNC5WL5SDTGXCFFANCNFSM42HMNO2Q.

PhVerkerk commented 3 years ago

You'll find the HTML version of Ramminger's dictionary there : https://filesender.renater.fr/?s=download&token=7fa545bc-a3a0-45c2-97a5-47599c157db0

As Collatinus does not need the alphabetical order (in the file of the dictionary), the lemmata come as they are in the numbered original files. I have to change dramatically my program to order the entries, so I hope it is fine for you. If needed, you can find the alphabetically ordered list of the entries in the first column of Ramminger_2020-jan20.idx you have in Collatinus (copy it, rename it .csv and open it with LibreOffice for instance, the ":" is the column separator).

Yours,

    Philippe.

Le 02/04/2021 à 15:38, Nicolas Vaughan a écrit :

Dear Phillipe,

Thanks for your message. The XML version of the L&S was very interesting. I was also interested in the Ramminger, but I found no downloadable version in the web page you mentioned.

All best,

Nicolas

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biblissima/collatinus/issues/64#issuecomment-812533831, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNY23H5YNXYKULNC5WL5SDTGXCFFANCNFSM42HMNO2Q.

nivaca commented 3 years ago

Dear Philippe,

You are very kind!

All best wishes, Nicolas