CIRCL / AIL-framework

AIL framework - Analysis Information Leak framework. Project moved to https://github.com/ail-project
https://github.com/ail-project/ail-framework
GNU Affero General Public License v3.0
1.3k stars 282 forks source link

Can't import files with non alphanumeric chars in path #475

Closed Nedfire2347 closed 8 months ago

Nedfire2347 commented 4 years ago

Hello again, I'm currently working with @src7 on some dumps And we can't import these kinds of sample with _bin/importdir.py because of their names.

Samples Examples : Importing folder named : Collection #1_BTC combos Import hieracrhi with files named : api_scrape_item.php?i=aZe0Rt1Y

Concerned code in this file : https://github.com/CIRCL/AIL-framework/blob/1f8c858c777a7da59134d257d7defb464dc487e5/bin/import_dir.py#L70

Notice : It could also lead to an exploitable vulnerabilty

src7 commented 4 years ago

Hello,

For now I use this workaround to rename to_import/1970/01/01/api_scrape_item.php?i=aZe0Rt1Y to to_import/1970/01/01/aZe0Rt1Y.txt in batch in a very efficient way.

Command (in the to_import folder) mmv ';*\=*' '#1#3.txt' (use -n to test before)

Terrtia commented 4 years ago

Hi @Nedfire2347 @src7 ! Thanks for the report !

The issue is related to the white-space in the path. (we use this separator in the Mixer module)

Fixed with 72fe8a2

src7 commented 4 years ago

Hi,

what about the ? and the = ?

Terrtia commented 4 years ago

you right ! I removed some special characters with bdf2fce332312554e574682b26b13793b4963f78

src7 commented 4 years ago

Nice

(it is a bit extreme but two files named &.txt and ?.txt can't coexist then ? Not a problem for me)

src7 commented 4 years ago

First of all, there is a new dependency to add : python3-magic

But is this output normal ? Some files are not gzipped ?

import_dir/pastebin>>to_import/2020/01/09/xqjKa4qa.txt.gz
import_dir/pastebin>>to_import/2020/01/09/xqShrUPm.txt
import_dir/pastebin>>to_import/2020/01/09/xqRx2kKG.txt.gz
import_dir/pastebin>>to_import/2020/01/09/xqDegaHQ.txt
import_dir/pastebin>>to_import/2020/01/09/xpvqC3p5.txt.gz
import_dir/pastebin>>to_import/2020/01/09/xpmfG0Fx.txt.gz
import_dir/pastebin>>to_import/2020/01/09/xpi1fgiw.txt.gz
import_dir/pastebin>>to_import/2020/01/09/xpDbbCTK.txt.gz
import_dir/pastebin>>to_import/2020/01/09/xnw2wdy2.txt
import_dir/pastebin>>to_import/2020/01/09/xnvEzU17.txt.gz
Terrtia commented 4 years ago

Thanks for the feedback ! All the files to import need to be gzipped. I removed the python3-magic dependency (already installed in the requirement). The importer use the magic number to check if a file is gzip compressed.

Fixed with 873797d87f52fe5e8b7df644b3b1cc3641d090ce

Files with the same file-path but different content are renamed by the Global Module(with an uuidv4)