Closed adamansky closed 1 year ago
The problem is that there are some weird characters in the meta table, including 0x04 and 0x06 ASCII control characters. Those are obviously not XML compatible, but xml.etree
accepted them and printed directly to the output. It seems that lxml.etree
would raise an exception.
acoustid=> select * from meta where id=3454685;
-[ RECORD 1 ]+---------------------------------
id | 3454685
track | \x1FxœcpJLOILQHNLKUÀ\x04 xƒ\x06C
artist | Television
album | Séries
album_artist |
track_no |
disc_no |
year | 2000
There is no reason why such characters should be there, so I guess I'll have to add some extra validation and I'll also switch to lxml. It seems more reliable.
Hi Lukáš! I've working on acoustid replication script. And found invalid replication dump: http://data.acoustid.org/replication/acoustid-update-4620.xml.bz2
xml.sax.parse
failed on this particular replication setIt may be bug in
xml.etree.cElementTree
(used in export_tables.py) but xml ecaping should be performed well during xml generation as shown in sample:Simple repair solution: