barnslig / torture

FTP search based on Go! and ElasticSearch for the 31. Chaos Communication Congress
MIT License
17 stars 6 forks source link

Load a chunk and analyze it #9

Open barnslig opened 9 years ago

barnslig commented 9 years ago

Load a chunk of 50kB / 1MB or whatever makes sense, get metadata with ffmpeg and create a checksum for the duplicate-recognition. This could give us interesting data for the search, especially when searching for music.

However, because this feature is really network-sucking and load-generating, we should do this as last thing as the filename and path should give us enough for a good search most times.

corny commented 9 years ago

I implemented this feature in my FTP crawler written in Ruby. It might help you: https://github.com/digineo/media_crawler/blob/v1/app/models/resource/chunk.rb https://github.com/digineo/media_crawler/blob/v1/app/models/resource/metadata.rb