Open vadimkantorov opened 2 months ago
Okay, I figured out how to get nice names for TexLive ISO files. I adapted the auto
mode from https://github.com/clalancette/pycdlib/blob/master/tools/pycdlib-extract-files. Maybe it would be nice if such auto
mode was supported directly in list_children(...)
(and others) API - e.g. by introducing an argument auto_path=""
(and if not for back-compat, maybe it could become the default if a path is specified as an ordinal arg and not a kwarg).
For this TexLive ISO file, what is strange is that auto detects rock ridge
, but using rr_path
in place of joliet_path
again makes file names not nice.
Even if a new arg is not introduced, it would be nice to have a note on joliet_path
in the Examples doc: https://clalancette.github.io/pycdlib/example-opening-existing-iso.html and https://clalancette.github.io/pycdlib/example-extracting-data-from-iso.html
So the remaining question on file offsets seems worked-around in
so can hope for a more explicit API/example for getting file offsets / file sizes (basically multiplying child.orig_extent_loc
by iso.logical_block_size
and maybe placing it into child.data_offset
).
import sys, io
import pycdlib
iso = pycdlib.PyCdlib()
iso.open('../texlive2024-20240312.iso')
if iso.has_udf():
pathname = 'udf_path'
elif iso.has_rock_ridge():
pathname = 'rr_path'
elif iso.has_joliet():
pathname = 'joliet_path'
else:
pathname = 'iso_path'
print(pathname)
for child in iso.list_children(joliet_path='/'):
print(child.file_identifier().decode('utf-8'))
extracted = io.BytesIO()
iso.get_file_from_iso_fp(extracted, joliet_path='/README')
print(extracted.getvalue().decode('utf-8'))
iso.close()
UPD: I guess in 2024 pure ISO files are not very common, so for getting nice file names out it would be awesome to showcase auto-detection or use of
joliet_path
in:Thanks!
Hi!
I'm trying out pycdlib for working with TexLive ISO distribution files: https://tug.ctan.org/systems/texlive/Images/texlive2024-20240312.iso, the end goal would be computing the file offsets and using
mmap(...)
to read to the individual files in the ISO (as then we can virtualize the files usingfmemopen(...)
).prints
If I mount the ISO on my Windows, I get the following
dir
listing:Why are the filenames from
pycdlib
coming out always in lowercase and contracted? (noteinstall-tl-windows.bat
coming out asINSTALL_.BAT;1
orREADME
asREADME.;1
) and why do they come out nicely in Windows dir?Is it because the TexLive ISO file is using some tricky ISO standard extension? How does one get nice filenames listing with
pycdlib
?Thanks a lot!