Help with datasets - Githubissues

Fengrui-Liu / nad21_ictfi

Best score: 0.6027936002

5 stars 0 forks source link

Help with datasets #1

Open zixuan1zhang opened 2 years ago

zixuan1zhang commented 2 years ago

I recently read your article . With regard to the ZYELL-NCTU nettraffic dataset used in the experiment, I tried to find its source for use in my experiment, but it failed. Is this dataset free from open source? If so, can you share it with me? Esteem it a favor.

Fengrui-Liu commented 2 years ago

https://drive.google.com/drive/folders/1qZQGYqAw6lTWRhEy6-VI8ldYbuEdZCYX?usp=sharing

Thanks for your remind, I have updated the readme. If this is helpful, please cite

zixuan1zhang commented 2 years ago

Thank you very much. I've got it.

zixuan1zhang commented 2 years ago

I have another question to ask you. Note that the extracted data set named < license > contains one of my data sets. If I use this dataset to conduct experiments and publish articles, does it involve the copyright of the dataset? Is it just a reference to < zyell-nctu nettraffic-1.0: a large scale dataset for realworld network analog detection >? Other steps are also required, such as contacting the author to obtain his consent to use the dataset. I know little about this aspect and look forward to your reply. Thank you very much.

Fengrui-Liu commented 2 years ago

Take it easy, you just need to cite the < zyell-nctu nettraffic-1.0> paper.

zixuan1zhang commented 2 years ago

That's really good. Thank you very much.

zixuan1zhang commented 2 years ago

I'd like to ask you some questions about the use of catboost. I am using catboost for a multi classification problem. In the process of parameter adjustment, because the speed of running catboost on CPU is too slow, I use GPU to run catboost instead. However, there is a problem. Even if the same random seed is used, the results of each training are biased, which makes it impossible to adjust the parameters to obtain the best model.I tried to find the answer online and found an explanation: "GPU training is uncertain, because it uses atomic addition of derivative, so every time your GPU training results will be slightly different, but it won't have much impact on the quality." How should we deal with this problem? We look forward to your reply. Thank you very much.

Fengrui-Liu commented 2 years ago

@zixuan1zhang If the built in random seed does not work well, I have not too much idea about this problem. It seems a cuda problem (not sure about that)

Well, if you want to describe the result, think about an evaluation metric with variance, like F1 score 0.95(+-0.1)

zixuan1zhang commented 2 years ago

Thank you for your reply. I can only try again.