Open baruchiro opened 9 months ago
Hi,
I was thinking of tackling this one using this library.
While the http
package has a mime type sniffing function, this has the benefit of the hierarchy of mime types, meaning the determination between binary/text is provided.
What do you think?
I was thinking of tackling this one using this library. While the
http
package has a mime type sniffing function, this has the benefit of the hierarchy of mime types, meaning the determination between binary/text is provided.
@nargov from their documentation:
Only use libraries like mimetype as a last resort. Content type detection using magic numbers is slow, inaccurate, and non-standard
I don't want to harm our performance, this library at least makes us read each file twice.
I'm looking for an idea to reduce the binaries scans, but without huge performance issues on one hand, and without doing magics for the user on the other hand. For example, last time we saw this problem, we added the max-target-megabytes flag to skip large files. Here, the only thing I can think of, is to somehow measure the time of doing a task for a specific file, and warn in the log about a potential performance issue.
By the way, I'm sorry for the late response, I was sick. I appreciate your help!
As an alternative, I see https://pkg.go.dev/net/http#DetectContentType reads at most 512 bytes to detect the MIME type. Think it's good enough?
OK, I think we can create a POC for that. Here is what I'm thinking:
[]byte
.You don't have to answer all the questions before you start developing.
Another option will be to ignore lines that are too long. On one hand, they might be a binary file. But on the other hand, they can be a minified JS file.
Steps to reproduce:
go build -o 2ms main.go
filesystem
scan with./2ms filesystem --path . --log-level debug
./2ms
executable itself.There are two problems here: