cgsecurity / testdisk

TestDisk & PhotoRec
https://www.cgsecurity.org/
GNU General Public License v2.0
1.55k stars 190 forks source link

File Detection... #107

Closed busterbeam closed 2 years ago

busterbeam commented 2 years ago

Just a small sample from using the file program on linux

f203547504.h:       ASCII text
f203548016.h:       ASCII text
f203548144.xml:     XML 1.0 document, ASCII text
f203548664.xml:     XML 1.0 document, ASCII text, with very long lines, with CRLF line terminators
f203548952.xml:     XML 1.0 document, UTF-8 Unicode (with BOM) text, with CRLF line terminators
f203548984.java:    Python script, ASCII text executable
f203548992.java:    Python script, ASCII text executable
f203549048.java:    Python script text executable Python script, ASCII text executable
f203549160.xml:     XML 1.0 document, ASCII text
f203551216.h:       ASCII text
f203553032.h:       ASCII text
f203553168.dex:     Dalvik dex file version 035
f203553176.dex:     Dalvik dex file version 035
f203553272.java:    Java source, ASCII text
f203553304.java:    Java source, ASCII text
f203553312.java:    Java source, ASCII text
f203553440.java:    Java source, ASCII text
f203553456.java:    Java source, ASCII text
f203554608.java:    C++ source, ASCII text
f203554640.java:    Java source, ASCII text
f203554648.java:    Java source, ASCII text
f203554696.java:    Java source, ASCII text
f203554728.java:    Java source, ASCII text

I would suggest use what file uses to categorise as they seem to be more accurate.

Great work on this program tho, has helped me A-LOT

cgsecurity commented 2 years ago

PhotoRec classification for text files is far from perfect but libmagic results aren't always better. file/libmagic won't be used.

busterbeam commented 2 years ago

Ok, fair.

I will say what makes me laugh the most is when it confuses java with python