gabriel-vasile / mimetype

A fast Golang library for media type and file extension detection, based on magic numbers
https://pkg.go.dev/github.com/gabriel-vasile/mimetype#pkg-overview
MIT License
1.48k stars 155 forks source link

Add support for yaml file format #88

Open gabriel-vasile opened 4 years ago

gabriel-vasile commented 4 years ago

1) Specify the MIME type and extension for which to add support application/x-yaml or text/yaml 2) Share an example file https://github.com/kubernetes-sigs/kustomize/blob/master/examples/helloWorld/configMap.yaml 3) Optionally, add a reference to the specification of the file format. https://yaml.org My approach to this would be similar to the JSON detection. A scanner can validate YAML text and return the index where a possible error occurred.

tebrizetayi commented 4 years ago

Is someone working on this issue?

gabriel-vasile commented 4 years ago

As far as I know, no one is working on yaml. However, you should know writing an yaml scanner is quite a time consuming task. If you want to go for other file formats, off the top of my head i can name: cpio, lzip, java-archives, corelDRAW files, zoo archives, bittorent files.

Good resources for how to identify these formats: https://www.garykessler.net/library/file_sigs.html https://github.com/file/file

tebrizetayi commented 4 years ago

However, you should know writing an yaml scanner is quite a time consuming task.

Why don't we use standart golang library for yaml?

If you want to go for other file formats, off the top of my head i can name: cpio, lzip, java-archives, corelDRAW files, zoo archives, bittorent files.

Jar-archives is already done. I can do it either

tebrizetayi commented 4 years ago

I want to write a matcher for the CorelDRAW file. But for checking it, I need also to send a filesize information to the matcher function. Matcher function accepts only one argument. How can we send filesize with byte array to matcher function?

gabriel-vasile commented 4 years ago

Unfortunately the size of the file is not available, main reason for it being that the library limits itself to reading just the header of files in order to save memory.

I'm not familiar with Corel file format, but after reading the wikipedia info and how others detect it I think it can be detected without knowing the length.

As far as I can see official Corel file format specification is not publicly available. Can you please link your source so we can compare it with what I've found and sort this out?

tebrizetayi commented 4 years ago

https://www.ntfs.com/corel-draw-format.htm

Second byte fragment is for checking the file size. I checked it with my custom .cdr file and it works.

gabriel-vasile commented 4 years ago

I guess you can exclude the check for file size and just check for the magic numbers. Don't forget to add all the aliases this MIME has. (tika) :relaxed: