Move device support data to external files #178

Closed DavidGriffith closed 11 months ago

In GitLab by @eschaton on Jun 29, 2020, 19:50

Right now if I want to add (say) M5M27C202JK support, I have to modify the minipro code directly. Rather than compile the set of supported devices into the code, it would be convenient to have them described in separate data files, e.g. $PREFIX/share/minipro and/or .d.

In GitLab by @elosha on Jul 28, 2020, 09:31

In #183, @radiomanV said:

[…] my infoic.dll dumper can save in three different formats: C header, ini file or xml. I think the ini is the most easily to understand.
We only need a good ini parser to load the file and the problem is solved.
What do you think?

I haven't tried a INI parser yet, but sounds good.

Oh wow, I just noticed the databases would be ~13MB as text … I hope performance impact is not a problem, parsing so many strings.

I wonder what happens if you download a new version of minipro and overwrite the database with the current version, after the user has added a new device. For that case a per-user config additional to the system-wide database would be required.

Quite a few consequences if you think about it ;)

In GitLab by @radiomanV on Jul 28, 2020, 11:07

Oh wow, I just noticed the databases would be ~13MB as text.

Both databases in C header format.

I hope performance impact is not a problem, parsing so many strings.

I don't think so but i'll do some tests.
Here are the TL866A/CS database in INI and XML format. Both have about the same size.
devices.ini
Devices.xml

I wonder what happens if you download a new version of minipro and overwrite the database with the current version, after the user has added a new device. For that case a per-user config additional to the system-wide database would be required.

We already have the infoic_custom.h and infoic2plus_custom.h files just for this purpose :)

I need to test a ini or xml parser to see the performance parsing about ~13000 devices.

Even if we move the database to external files should we keep the actual structure? i mean two database files per programmer? like infoic_devices.ini + infoic_custom.ini for TL866A/CS and infoic2plus_devices.ini + infoic2plus_custom for TL866II+ ?

For example:

(TL866A)

[ATMEGA8]
protocol_id = 0x71
variant = 0x43
read_buffer_size =  0x100
write_buffer_size = 0x40
code_memory_size = 0x2000
data_memory_size = 0x200
data_memory2_size = 0x00
chip_id = 0x1e9307
chip_id_bytes_count = 0x03
opts1 = 0x400
opts2 = 0x40
opts3 = 0x01
opts4 = 0x182630
opts5 = 0x00
opts6 = 0x00
opts7 = 0x81
package_details = 0x1c000700
config = avr2_fuses

(TL866II+)

[ATMEGA8@DIP28]
protocol_id = 0x1d
variant = 0x43
read_buffer_size =  0x100
write_buffer_size = 0x40
code_memory_size = 0x2000
data_memory_size = 0x200
data_memory2_size = 0x00
chip_id = 0x1e9307
chip_id_bytes_count = 0x03
opts1 = 0x04
opts2 = 0x40
opts3 = 0x01
opts4 = 0x182630
opts5 = 0x0400
opts6 = 0x00
opts7 = 0x81
opts8 = 0x0c25
package_details = 0x1c000700
config = avr2_fuses

We have a different protocol_id, variant and other members.
If we concatenate this we should double some options like protocol_id1 and protocol_id2. Not to say we should add a flag option to indicate if that device is supported in TL866A/CS or something like that.
But i consider this ugly not to say the big amount of work. Also if we want in the future to add support for the newest T56 programmer the database should be changed again.

So, my vote is two files per programmer.

In GitLab by @elosha on Jul 29, 2020, 04:41

Thanks. I did a quick research:

• Lots of toolkits already support XML, including the NSXMLParser class built into Mac OS X and GNUStep. This is good for GUI programming. And since the XML style you chose is reduced, it could well be parsed without fully blown XML support, too.

• The Ini landscape is more scattered. But the database still has a very predicable format, writing a simple parser might be an option.

Regarding databases: I'm not too deep into this project, but your "double options" approach is not too bad.

However I understand it would be troublesome to synchronize supported devices across programmers with each release of new firmware or programmer. (It would be easier to switch to a real NoSQL database then. With devices stored in a table. And algorithms for each programmer in another table, referencing the devices.) So I'm fine with your decision. I could still query minipro by the -l command ;)

In GitLab by @radiomanV on Jul 29, 2020, 07:12

Good, then XML should be :)
I'm not going to reinvent the wheel and write an XML parser for C.
Here: http://ezxml.sourceforge.net/ is an XML parser used by me in the past. Quite lightweight and fast enough. This way we don't need to add other dependencies(only two files ezxml.h and ezxml.c).
The XML format is good enough to make a single file for both databases including custom devices.

In GitLab by @radiomanV on Jul 31, 2020, 12:33

Implemented external database as XML. There's a single xml for both databases :)

Because the chips are sorted by manufacturer name, the database is loaded fast (150ms for ~30000 devices) unlike the first xml posted above which was loaded quite slow (2Sec for 30000 chips).

The database is copied to $PREFIX/share/minipro (/usr/local/share/minipro in our case) when make install command is issued. Windows is also supported by disabling the memory mapping (mmap) support in ezxml and use malloc to load the xml file instead of memory mapping like in Linux. Mac OS and ARM platforms were not tested.

The database file is searched at startup in $PREFIX/share/minipro and the current directory which, has the highest priority. For Windows the database is searched under %APPDATA%\minipro (usually c:\ProgramData\minipro) and current directory which, again, has the highest priority.

There are some xml attributes which are not used right now (manufacturer name and chip type). These can be ignored, i use them only to help me visually.

database

Here is: https://gitlab.com/radiomanV/minipro/-/tree/external_db (the external_db branch).
Please test and report back. If you agree with the database format i should make a merge request.

In GitLab by @elosha on Aug 2, 2020, 06:51

Good work! Sorting by manufacturers is excellent. The manufacturer name and chip type still can be using for filtering with a GUI very well.

It works on my trusty Mac OS X 10.9.5 (64bit) and using Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn) … yes that's old :)

(Not tried to read with NSXMLParser, but should work. Otherwise I'll use ezxml, too. This is on the list for a 2.0 release of my GUI application and 1.0 still doesn't work.)

However I noticed quite a delay while handling the 12MB XML file on my i5 with 4GB of RAM, despite having a SSD. Both with minipro and external editors.

If we need just one database at a time – because most people only have one programmer – maybe separate files for separate databases wouldn't be that bad? That could speed it up, and save CPU and battery, if you only read the 6MB you need. :wink:

In GitLab by @radiomanV on Aug 2, 2020, 10:27

Unfortunately that delay is not the time loading from disk.
The ezxml will parse the xml after the file is loaded from disk and if no error then will return an ezxml_t valid pointer.

The load time is very low; 0,02ms with mmap and 10-15ms without mmap (fopen, malloc, fread style) regardless of file size (6 or 12 MB).
So this is more of a file structure delay.

I can make two xml files but the parse time will be the same.

I have added some benchmarking to ezxml: Load time= load from disk, parse time= ezxml validation, read time= minipro reading all chips from ezxml structure.

12MB XML file, mmap
./minipro -V
Database load time:  0.03ms  OK.// very low because of mmap :)
Database parse time:  191.70ms  OK.
Database read time:  116.29ms  OK.

12MB XML file, no mmap
./minipro -V
Database load time:  17.08ms  OK. // note the increased load time from disk
Database parse time:  187.91ms  OK.
Database read time:  116.38ms  OK.

Now let's shrink the xml databse to 6MB (only TL866II)

6MB XML file, mmap
./minipro -V
Database load time:  0.03ms  OK. // same load time like the 12MB file
Database parse time:  238.32ms  OK. // same parse time for one database!
Database read time:  65.13ms  OK. // half time because we only read a single database

6MB XML file, no mmap
./minipro -V
Database load time:  16.69ms  OK. // same load time like the 12MB file
Database parse time:  219.44ms  OK. // same parse time for one database!
Database read time:  66.13ms  OK. // same time

So, we have a constant parse time regardless of how the big the file is. The mmap feature will reduce the load time but this is however, still low. The database read time in minipro can be reduced by loading only the needed database on demand (this will halve the time).

The only issue is that ~200ms parse time which is CPU dependent. All tests were made with my Intel G4560/16GB DDR4, Linux Mint development machine.
So two xml files (or more) doesn't help at all.

Now some delights: the first approach database (unsorted by manufacturer) with devices structured like this:

<?xml version="1.0" encoding="utf-8"?>
<infoic>
  <database name="TL866A">
    <devices>
       <ic name="bla" />
       <ic name="bla" />
      ...~30000 devices
    </devices>
  </database>
</infoic>

./minipro -V
Database load time:  0.03ms  OK. // Low load time
Database parse time:  10672.33ms  OK. // Boo! 10 seconds :( 
Database read time:  0.54ms  OK.

So, it is more important how we organize the xml.
This ezxml_t ezxml_parse_str(char *s, size_t len) from ezxml.c is the culprit. Perhaps some optimizations can be made.

Please do a git pull and recompile (the same external_db branch) to see the database benchmarks and if you want post the result here.

In GitLab by @radiomanV on Aug 7, 2020, 10:48

I'm using an XML SAX parser now. The xml file is read top down and devices are printed or loaded depending on what we are doing. This way the entire process is fast and low on memory (unlike the ezxml DOM approach).
Please test this here: https://gitlab.com/radiomanV/minipro/-/tree/sax_database (the sax_database branch)

In GitLab by @elosha on Aug 10, 2020, 08:01

Thanks for testing and explaining this thoroughly. Now I understand.

Tested the SAX parser: Great performance on my machine too, almost feels like the builtin database. It was worth it! :thumbsup:

In GitLab by @radiomanV on Aug 10, 2020, 09:37

almost feels like the builtin database.

Traversing the entire database is something about 5-10ms, so indeed is fast but i had to rewrite the database.c.
There's a somewhat complicated state machine behind this approach which gave me some headaches, but indeed it was worth it :)

@DavidGriffith should we change the database to XML format?

Sorry about the delay in getting to this. The database should definitely be move outside of the binary. Having it in the binary has made things rather opaque. Naturally this will complicate installation a bit, and I see you've addressed that in !171. XML seems like a good choice. I'll go over this tonight and do the merge tomorrow.

Thanks! This looks really clean.

DavidGriffith / minipro-import-test

Move device support data to external files #178