Closed i30817 closed 6 years ago
Excellent, glad to hear you are using the library.
I want to make sure the library can handle a wide variety of ISO files, so I am definitely in favor of making it more relaxed. However, I'm not really in favor of adding a new mode. Instead, what I will probably do is unconditionally relax some of these constraints during parsing. Mostly they have been there for sanity, but the library is mature enough that I think removing them makes sense. I'll remove some of them (probably early next week), then update this issue once I do so you can give it another try.
Thank you!
After looking at this for a bit, my suspicion is that the library doesn't actually need to be more permissive. Instead, I think that some of the .bin/.cue files are using track sizes that pycdlib doesn't expect. In particular, pycdlib expects a 2048-byte track/extent size (the "user" track size), while some of these files have a 2352-byte track/extent size (the "physical" track size).
However, this is just a theory right now. Is there any way you could somehow share at least one of the .bin/.cue file combinations with me? That would make it a bit easier to confirm whether that is what is happening.
They're indeed all MODE2/2352 cue/bin (this was a psx standard iirc) and can easily be found in any redump torrent, but i'll print out names and md5sums of the games data track that fail on my not complete collection. Redump site has a complete collection of the cues and the dat with the md5sums too (because they're not copyrighted).
https://gist.github.com/i30817/68b11f17364192fbb5461a3e6c57c06f
edit: i probably should have printed the exceptions too. Do you want them? edit 2: updated it to include traceback. Also these are all only 'disc1' of games with multiple discs for simplicity sake.
From that list of errors, i'll notice that these are hardpatched with a translation, so you probably shouldn't count them too much because it might have been a translator error. Indeed, most of these correspond to the more exotic errors above (the main dump error being 'year out of range'): Fantastic Night Dreams - Cotton Original Policenauts Echo Night 2 Super Robot Taisen Alpha Gaiden Brigandine - Grand Edition
The smallest there that is not a translation (that would be fantastic night dreams cotton) is probably "Harvest Moon - Back to Nature (USA)". Searching google for it (redump dump) and checking the md5sum matches should be enough for a example of the 'year out of range' error. I'm sorry but i don't have a ps1 demos isos to test.
What is confusing me here is that the 'reader' adapter actually reads bytes in 2048 increments, whether the original was psx cds (CD-ROM XA Mode 2, Form 1)or not (iso) have 2048 bytes of data as usual. I guess the 'raw' data being parceled out as 2048 bytes sectors is not enough not to appease the parser if the rest of the error correction data is 'lost'? (why?)
The idea i had is that libmirage tracks are free of those extra bytes anyway (the Sync pattern, address, mode, subheader, error detection and error correction bytes), so its tracks are always a multiple of 2048 bytes and the sectors 2048 bytes.
Thanks for the pointers. I'll take a look at the ISOs.
You may be right, and the data coming out of libmirage may just be the 2048 user data. So maybe my theory is bunk :). It just seemed odd to me that so many different images had so many problems, and that was one thought of mine. I'll find out for sure once I dump out some additional data.
Well, for every one that fails there are more or less 4 that don't. Not that that makes sure they're parsing correctly though.
I think the issue here is that the date/time fields contain non-standard values, resulting in exceptions when trying to parse the date/time.
For example, I have a PC game disc image that fails with:
ValueError: time data '00000000000000' does not match format '%Y%m%d%H%M%S'
The date/time fields that are supposed to be undefined are:
b'0000000000000000T'
The last T is a deviation from the standard; it should be a binary zero (\x00) for the timezone offset. Due to the last 'T', the string passes the pycdlib's check for empty string (all \x00, all ASCII 0, or 16 ASCII 0 followed by a \x0)
The PSX images, on the other hand, appear to contain invalid time format for volume creation date/time; e.g, Fear Effect (USA) (Disc 1) has:
b'0000020122400000$'
(while the rest are properly marked as empty)
One work-around for these broken formats would be to place the date/time parsing code into try/catch block, and mark the descriptor date as non-present upon parsing exception.
Thanks for that analysis, that was really useful. I've now made some changes based on your recommendations. First, I've made it so that we don't care what's in the unused fields for directory records. That should fix your first problem. Second, I've taken your advice and now do the time.strptime
date parsing in a try..except ValueError
block, and just assume the date is unspecified if we ever see it. I think this should fix up most of your errors, so please pull master, try it out, and let me know how it goes.
This still leaves a couple of other problems:
Policenauts (Japan) (Disc 1).cue (translation)
raise pycdlibexception.PyCdlibInvalidISO("Invalid root directory entry identifier")
PyCdlibInvalidISO: Invalid root directory entry identifier
Super Robot Taisen Alpha Gaiden (Japan) (v1.1).cue (translation)
File "/usr/local/lib/python2.7/dist-packages/pycdlib/headervd.py", line 523, in parse
raise pycdlibexception.PyCdlibInvalidISO("File structure version expected to be 1")
PyCdlibInvalidISO: File structure version expected to be 1
I'm much more suspicious of these types of problems, since these fields are more integral to how the ISO works. It looks like there is just one that has an Invalid root directory entry identifier
, and two that have the wrong File structure version
. I'll think about those a bit more.
Those are caused by translations, that's why i pointed them out above. Their md5sum is not redump standard (or better, it was redump standard before the translation). It's kind of debatable if you should bother to work around it. In one hand, they obviously work (and can be mounted on cdemu and were often tested in hardware). On the other: it's a uncommon hacker error.
I'd prefer if the standard - with a switch so it doesn't bother people that actually want validation - was relaxed with some defaults that work though, since the whole point of the tool i'm making is not to care about translations or dump differences before extracting a id.
Something like
def parse_relaxed(lambda, default):
try:
return lamda()
catch:
if(relaxed):
return default
else:
raise
BTW, here are the relevant translation patches (to the redump originals):
http://www.romhacking.net/translations/1422/ (policenauts)
https://www.romhacking.net/translations/265/ (cotton - i believe i further processed the result of this with a ECC corrector because the patch was badly done and didn't work on mednafen otherwise)
https://www.romhacking.net/translations/2380/ (echo night 2)
https://www.romhacking.net/translations/859/ (super robot wars alpha gaiden)
Ok, i just tested it out with the new code installing from pip with master (after editing the utility on the first link to handle seek with os.SEEK_END optional parameter), and every disc i have manages except the three translated games: 2 discs of policenauts, 1 disc echo night 2, and 1 disc of super robot taisen alpha gaiden.
Cotton apparently passes now (it was the one that had 'PyCdlibInvalidISO: data in 3rd unused field not zero')
I don't know if you'd want your project to be as permissive by default to malformed data as apparently cdemu and various emulators can be, i leave you to decide the fate of those 3. I don't have a full redump collection so i can't really tell if those are all the possible errors (not to mention the unlimited set of future translations), so i'll wait for what you decide.
Eh, tried to make the wrapper deal with mixed mode cds (basically playstation cds with mode2 form 2 sectors for videos and audio) by making a scheme where the wrapper constructor scans the file for the form2 sectors and read(index) adapts to them to always get correct quantity of bytes, and pycdlib reacts worse than the 'incorrect' way (faking that every sector is 2048 and just truncate the bytes from form2 sectors). Oh well, i'm not trying to make a generic extractor so no problem.
I've ended up doing a few more commits which allow anything in the root directory record file_ident field and in the file structure version field. If the library sees that they are wrong, it basically changes the in-memory versions to standards-compliant ones, and goes on. That should allow you to parse the rest of your ISOs.
In general, I am trying to make the library accept as many ISOs as possible. But I think I'd rather "fault-in" changes like this as people run across them, rather than making it all relaxed to start with.
Anyway, let me know if it works for you. If so, I'll close this out. Thanks again for your ideas, patience, and testing here.
Works for me. There is just a dump translation that apparently deleted the system description and another app issue of ps1 and ps2 discs having the same system description, but that's nothing to do with the library.
Great, thanks for the testing. I'll close this for now; feel free to open a new issue if you run into other problems.
I'm using this library along with libmirage to extract some serials and other ids from console cd images. https://gist.github.com/i30817/f58c7f904fe44c4599a6c9d48a60bce0
The psx redump collection has several games that choke on this, most notably from strict value checking when parsing.
Some (non complete) examples are:
Fantastic Night Dreams - Cotton Original (Japan).cue (this one has a fanpatch though) File "/usr/local/lib/python2.7/dist-packages/pycdlib/headervd.py", line 520, in parse raise pycdlibexception.PyCdlibInvalidISO("data in 3rd unused field not zero") PyCdlibInvalidISO: data in 3rd unused field not zero
Fear Effect (USA) (Disc 1).cue (original) ... File "/usr/local/lib/python2.7/dist-packages/pycdlib/headervd.py", line 563, in parse self.volume_creation_date.parse(vol_create_date_str) ... ValueError: year is out of range
Policenauts (Japan) (Disc 1).cue (translation) raise pycdlibexception.PyCdlibInvalidISO("Invalid root directory entry identifier") PyCdlibInvalidISO: Invalid root directory entry identifier
Super Robot Taisen Alpha Gaiden (Japan) (v1.1).cue (translation) File "/usr/local/lib/python2.7/dist-packages/pycdlib/headervd.py", line 523, in parse raise pycdlibexception.PyCdlibInvalidISO("File structure version expected to be 1") PyCdlibInvalidISO: File structure version expected to be 1
Of this by far the most common is 'year out of range'.