PRX / fixer.prx.org

Media processing application.
MIT License
6 stars 1 forks source link

original_format job attribute #37

Closed pkarman closed 9 years ago

pkarman commented 9 years ago

The BaseProcessor class supports an original_format attribute which seems to allow for explicitly passing in the original file content type, esp when it cannot be determined from the URL.

However there does not seem to be a way to set that attribute when POSTing a new job. I would expect the Job API to accept a 'original_format' attribute to mirror the 'original' attribute.

kookster commented 9 years ago

passing in the format is one way to address part of the problem.

The crux is that soxi, used often in audio_monster, won't return good info if a file has no extension (thanks soxi :unamused: )

Rather than passing it in, we can detect the file type using: file --brief --mime-type some_local_file

or we can use the mimemagic gem that does something similar (though I think file is probably better). I use mimemagic in the speechmatics gem, and it has worked so far.

Given the type, we can set the extension of the created tempfile to use that, and everything will work.

Another option is to convert the soxi dependent methods in audio_monster to use ffmpeg which does a better job of detecting type w/o file extension.

pkarman commented 9 years ago

those all sound like good things to do, in addition to supporting explicitly passing in the format. +1

kookster commented 9 years ago

seems like the best option may be ffprobe, which can spit out nice json, and from the stream.codec and format.format_name you can figure out what the file should be pretty well.

ffprobe -v quiet -print_format json -show_format -show_streams test_file_no_extension

kookster commented 9 years ago

I have audio monster updated so it doesn't care about file extension anymore, and it is now relying on ffprobe instead of soxi. I also added better format detection to audio monster, so at least for audio it can be smarter about detecting the format regardless of file name.

Next up is updating fixer to use the latest, and to handle when original format is passed in via api call, or to detect it better when it isn't.

kookster commented 9 years ago

This was fixed by #39