MichaelYin1994 / tianchi-trajectory-data-mining

天池DCIC2020船只轨迹数据挖掘比赛算法阶段Rank 3解决方案:
https://tianchi.aliyun.com/competition/entrance/231768/introduction
MIT License
105 stars 39 forks source link
machine-learning tianchi-competition trajectory-analysis

DCIC 2020数字中国创新大赛数字政府赛道:智慧海洋建设Rank 3解决方案


队伍简介

liu123的航空母舰队,队长鱼丸粗面(zhuoyin94@163.com)。复赛算法阶段F1成绩0.8995(3/3275),复赛可视化阶段成绩21.0(7/14)。注:以上Rank为算法赛阶段成绩。

主要依赖packages与运行依赖环境


基本思路说明

本项目采用了传统统计机器学习建模与轨迹数据挖掘[1]的思路。特征工程主要包括两部分:基础统计特征与轨迹embedding特征;模型方面采用了XGBoost和LightGBM作为基模型。以下为简单介绍:


代码文件说明

预处理部分

POI信息挖掘部分

Embedding部分

特征工程部分

模型训练

辅助文件


文档与PPT百度网盘链接


References

[1] Zheng Y . Trajectory Data Mining: An Overview[J]. ACM Transactions on Intelligent Systems and Technology, 2015, 6(3):1-41.

[2] Schafer R W. What Is a Savitzky-Golay Filter? [Lecture Notes][J]. IEEE Signal Processing Magazine, 2011, 28(4): 111-117.

[3] Greg Welch, Gary Bishop. An Introduction to the Kalman Filter[M]. University of North Carolina at Chapel Hill, 1995.

[4] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. ICLR Workshop, 2013

[5] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed Representations of Words and Phrases and their Compositionality. NIPS 2013

[6] Palma A T, Bogorny V, Kuijpers B, et al. A clustering-based approach for discovering interesting places in trajectories[C]//Proceedings of the 2008 ACM symposium on Applied computing. 2008: 863-868.

[7] Zhang A, Song S, Wang J. Sequential data cleaning: a statistical approach[C]//Proceedings of the 2016 International Conference on Management of Data. 2016: 909-924.

[8] https://github.com/RaRe-Technologies/gensim/issues/641