daokoder / dao-modules

Dao Standard Modules
http://daovm.net
12 stars 5 forks source link

web.mime - enhancements #48

Closed dumblob closed 9 years ago

dumblob commented 9 years ago

Currently only the method identify(target: string) => list<string> is available, but a method with reverse semantics extof(mime: string) => list<string> (not sure if the relationship is really N:M, but if only 1:M, then this should be correct: extof(mime: string) => string) returning feasible extensions would be also convenient.

Another enhancement would be (based on how the Go module mime handles it) addition of reading /etc/mime.types /etc/apache2/mime.types /etc/apache/mime.types (if available on unix systems) and from Windows registry on Microsoft systems. This might also require addition of yet another method - refresh() - running stat() and updating the mapping if some change occured.

And finally, the biggest improvement would be to add support for "real" MIME identification based on https://github.com/file/file/ (compiled statically into the module including the magic DB which has about 408KB in plain text) and extending the interface to become something like identify(target: string|stream|Entry|File, magic = false) => list<string> (Entry is not an issue, because a directory has usually also MIME type - see http://stackoverflow.com/a/26211796).

Btw I didn't use file extensions for about 6 years - I always use the magic database and working with files according to their content. It's a very safe way of treating files.

Night-walker commented 9 years ago

Currently only the method identify(target: string) => list is available, but a method with reverse semantics extof(mime: string) => list (not sure if the relationship is really N:M, but if only 1:M, then this should be correct: extof(mime: string) => string) returning feasible extensions would be also convenient.

What for? Go's mime does not feature this, for instance.

dumblob commented 9 years ago

Well, I don't want to bother myself with extensions - I just want to save data in some format and name the file accordingly. Doing it manually is error-prone (especially on Windows this causes unexpected issues from my experience).

Night-walker commented 9 years ago

Added mime.updateDb() which reads mappings from a /etc/mime.types-like file.

Night-walker commented 9 years ago

Well, I don't want to bother myself with extensions - I just want to save data in some format and name the file accordingly. Doing it manually is error-prone (especially on Windows this causes unexpected issues from my experience).

Still reluctant. There is little incentive to manually handle data received over http anyway...

And finally, the biggest improvement would be to add support for "real" MIME identification based on https://github.com/file/file/ (compiled statically into the module including the magic DB which has about 408KB in plain text) and extending the interface to become something like identify(target: string|stream|Entry|File, magic = false) => list (Entry is not an issue, because a directory has usually also MIME type - see http://stackoverflow.com/a/26211796).

Not sure how much efforts that will require. I had no intention to tackle with 'invasive' MIME guessing.

dumblob commented 9 years ago

Hm, what about moving this module somewhere else out of web? I'll use it mostly for something else then web stuff anyway.

Then the reluctance to both extof() and MIME identification might get significantly lowered :wink:. The MIME identification might be though postponed as it would require writing a specific makefile.dao.

Night-walker commented 9 years ago

Hm, what about moving this module somewhere else out of web? I'll use it mostly for something else then web stuff anyway.

And then move it back to web if/when by-content guessing is implemented? :)

dumblob commented 9 years ago

Not really. Looking at the current list of modules in the standard library, web is not a bad place.

I'm pretty certain, that extof() will be very useful - presumably not much in the context of http transfers, but everywhere else.

And as of the MIME identification by content, except for it's usefulness (e.g. preventing once and forever user complaints like "It can't open the file on Windows." or preventing execution of "bak" files as "bat" files or any other mistake), I don't want to split MIME functionality among standard library and user modules - especially if the interface of the whole module is so tiny (basically just key-value lookup in both directions and the updateDb() method).

Night-walker commented 9 years ago

I'm pretty certain, that extof() will be very useful - presumably not much in the context of http transfers, but everywhere else.

At least I would like to see a practical case, for Dao.

And as of the MIME identification by content, except for it's usefulness (e.g. preventing once and forever user complaints like "It can't open the file on Windows." or preventing execution of "bak" files as "bat" files or any other mistake), I don't want to split MIME functionality among standard library and user modules - especially if the interface of the whole module is so tiny (basically just key-value lookup in both directions and the updateDb() method).

The module does way more than e.g. Mongoose (Marten, civetweb) regarding MIME handling, which is why I wrote it in the first place. Mongoose serves files by extensions and it apparently suits its users. I don't claim that web.mime is ideal in its current form, but it is arguably better to have at least such module then nothing at all.

dumblob commented 9 years ago

At least I would like to see a practical case, for Dao.

Every time I used this, it was somehow bound to MIME identification by content or at least to unpredictable user input. Last time it was when recovering files from a disk - file type identification based on MIME and automated naming with proper extensions. All other "less technical" use cases basically boil down to GUI - e.g. save dialogs should suggest proper extension.

I don't claim that web.mime is ideal in its current form, but it is arguably better to have at least such module then nothing at all.

Sure. I just wanted to point out, that extensions are unreliable, harmful and used because they're more user-friendly (they don't need any SW support and are usually visible - even though on modern Windows systems they're often hidden :cry:) and also much easier for implementation. And that there is a solution in the form of by-content MIME identification which we shouldn't omit :)

Night-walker commented 9 years ago

Every time I used this, it was somehow bound to MIME identification by content or at least to unpredictable user input. Last time it was when recovering files from a disk - file type identification based on MIME and automated naming with proper extensions. All other "less technical" use cases basically boil down to GUI - e.g. save dialogs should suggest proper extension.

All that is either too unlikely in case of Dao or too abstract. I don't see an immediate need to deal with yet another set of mappings. I don't see a task at hand which that functionality would solve, so I'm just lazy to do that.

Sure. I just wanted to point out, that extensions are unreliable, harmful and used because they're more user-friendly (they don't need any SW support and are usually visible - even though on modern Windows systems they're often hidden ) and also much easier for implementation. And that there is a solution in the form of by-content MIME identification which we shouldn't omit :)

Perhaps there is a suitable magic-library which can be wrapped with little efforts?

dumblob commented 9 years ago

All that is either too unlikely in case of Dao or too abstract.

I've deliberately mentioned only real use cases.

I don't see an immediate need to deal with yet another set of mappings. I don't see a task at hand which that functionality would solve, so I'm just lazy to do that.

Let's make up one :) Imagine, you're an organizer of a small programming contest and you're gathering the short codes written in any programming language using an html submit form. Now, to run them on Windows, you need to save the form content to files with appropriate extensions after examining them for their MIME type using magic library.

Perhaps there is a suitable magic-library which can be wrapped with little efforts?

Not to my knowledge - everything I found is using the original libmagic from https://github.com/file/file/ . The binding itself shouldn't be so difficult.

Night-walker commented 9 years ago

Let's make up one :) Imagine, you're an organizer of a small programming contest and you're gathering the short codes written in any programming language using an html submit form. Now, to run them on Windows, you need to save the form content to files with appropriate extensions after examining them for their MIME type using magic library.

OK, various contestants submit .c and .cpp files which happen to have the same text/x-c mime type. Some files don't compile, contestants in ire :)

dumblob commented 9 years ago

Nice :) I didn't know, that file doesn't do any additional prioritization (like C is "subset" of C++) - see http://unix.stackexchange.com/questions/140238/file-command-confusing-c-c . Even though this can be solved, let's leave the mime module without messing around with offsets etc. for the time being.

The reverse mapping method though sounds still tempting :)

daokoder commented 9 years ago

I think the mime module should not include the functionality to determine the MIME type of a file by examining its content. It's just too much for a simple module. Also determining the mime type that is not always reliable. It is even doubtful that the standard modules should include such functionality due to its guessing nature and big (for this functionality) size of the compiled magic file. I don't see strong justification for this.

Night-walker commented 9 years ago

Actually, an external, user-provided magic file could be used instead of packing one with the module...

dumblob commented 9 years ago

Actually, an external, user-provided magic file could be used instead of packing one with the module...

That actually makes sense, because there is no standard location for the magic file (especially not a multiplatform one).

daokoder commented 9 years ago

Actually, an external, user-provided magic file could be used instead of packing one with the module...

This may make it very inconvenient to use, and seems to be against the purpose of standard modules.

Night-walker commented 9 years ago

Then I suppose the work is done here, at least for now.

dumblob commented 9 years ago

Can we mark this issue with some wish-list tag (a tag, which doesn't say the features are approved, but also not abandoned) instead of closing it?

daokoder commented 9 years ago

I think we should close this issue for the time being. Because there is no strong reason to include the mime guessing functionality in standard modules. A new issue can be opened when a strong argument is found to support this feature.

dumblob commented 9 years ago

Ok. But the question about reverse mapping MIME -> extension still holds - how about pull requests? By the way, there is already one from @ShadauxCat.

Night-walker commented 9 years ago

Closing, as no new enhancements are planned at the moment.