bojand / infer

Small crate to infer file and MIME type by checking the magic number signature
MIT License
299 stars 28 forks source link

Unable to infer LZMA files #91

Open rawhuul opened 1 year ago

rawhuul commented 1 year ago

I have these files which are LZMA type as shown in extension and by 7z archiver

image

image

But when used in this code:

#[derive(Debug)]
pub enum FileKind {
    ZipArchive,
    SevenZipArchive,
    TarArchive,
    GZipArchive,
    LZMAArchive,
    LZHArchive,
    Other,
}

impl FileKind {
    pub fn infer(path: &PathBuf) -> Self {
        println!("{:?}", path);

        let kind = infer::get_from_path(path).unwrap();
        println!("{:?}", kind);

        match kind {
            Some(k) => match k.mime_type() {
                "application/zip" => Self::ZipArchive,
                "application/x-7z-compressed" => Self::SevenZipArchive,
                "application/x-tar" => Self::TarArchive,
                "application/gzip" => Self::GZipArchive,
                "application/x-xz" => Self::LZMAArchive,
                "application/x-lzip" => Self::LZMAArchive,
                _ => Self::Other,
            },
            None => Self::Other,
        }
    }
}

It gives output:

warning: `scoopie` (bin "scoopie") generated 7 warnings (run `cargo fix --bin "scoopie"` to apply 5 suggestions)
    Finished dev [unoptimized + debuginfo] target(s) in 3.82s
     Running `target\debug\scoopie.exe install coreutils`
Commands { cmd: Install(InstallCommand { app: Some("coreutils"), download_only: false, sync: false, update_all: false }) }
"C:\\Users\\Rahul\\scoopie\\cache\\coreutils#5.97.3#https_downloads.sourceforge.net_project_mingw_MSYS_Base_msys-core_msys-1.0.13-2_msysCORE-1.0.13-2-msys-1.0.13-bin.tar.lzma"
None
"C:\\Users\\Rahul\\scoopie\\cache\\coreutils#5.97.3#https_downloads.sourceforge.net_project_mingw_MSYS_Base_gettext_gettext-0.17-2_libintl-0.17-2-msys-dll-8.tar.lzma"
None
"C:\\Users\\Rahul\\scoopie\\cache\\coreutils#5.97.3#https_downloads.sourceforge.net_project_mingw_MSYS_Base_libiconv_libiconv-1.13.1-2_libiconv-1.13.1-2-msys-1.0.13-dll-2.tar.lzma"
None
"C:\\Users\\Rahul\\scoopie\\cache\\coreutils#5.97.3#https_downloads.sourceforge.net_project_mingw_MSYS_Base_termcap_termcap-0.20050421_1-2_libtermcap-0.20050421_1-2-msys-1.0.13-dll-0.tar.lzma"
None
"C:\\Users\\Rahul\\scoopie\\cache\\coreutils#5.97.3#https_downloads.sourceforge.net_project_mingw_MSYS_Base_coreutils_coreutils-5.97-3_coreutils-5.97-3-msys-1.0.13-bin.tar.lzma"
None
"C:\\Users\\Rahul\\scoopie\\cache\\coreutils#5.97.3#https_downloads.sourceforge.net_project_mingw_MSYS_Base_coreutils_coreutils-5.97-3_coreutils-5.97-3-msys.RELEASE_NOTES.txt"
None
Ok(())

As if I am doing something wrong or anything else? One more thing is that how to infer LZH archives?

STashakkori commented 1 year ago

@bojand and collaborators. I believe I ran into this a while ago when developing Salvum. Please check out the archive.rs that I modified at the time to see if there is anything useful then can merge in what you want. I apologize if this is not the best way to collaborate but I have way too much code to push right now so forgive me I must press forward.

https://github.com/STashakkori/Salvum_Infer/blob/main/src/matchers/archive.rs

Specific to lzma I added this: pub fn is_lzma(buf: &[u8]) -> bool { buf.len() > 4 && buf[0] == 0x5D && buf[1] == 0x00 && buf[2] == 0x00 && (buf[3] == 0x80 || buf[3] == 01 || buf[3] == 10 || buf[3] == 08 || buf[3] == 20 || buf[3] == 40 || buf[3] == 80 || buf[3] == 00) && (buf[4] == 0x00 || buf[4] == 0x01 || buf[4] == 0x02) }

@rawhuul see if this addresses your issue, if not apologies. Best, $t@$h