Since we're changing some things already, it's a good time to synchronize the free text fields that are frequent to common names.
In this PR exactly that was done, plus some changes to make the papers v2.2 compatible:
fixed features from old papers for which the verifier didn't account for context
removed all empty lists, and converted them into either "missing" or "none", for the mandatory fields, and just removed them for the optional fields
replaced a lot of things
The checks will not pass for now, as the specification is yet to be updated.
This should only be merged after the checks pass.
The full list of replacements is here (in the form of csv file):
field,old,new,note
reference.publication_name,acm sigcomm computer communication review,ACM SIGCOMM Computer Communication Review,
reference.publication_name,ACM SIGCOMM conference on Internet measurement,ACM SIGCOMM conference on Internet Measurement,
reference.publication_name,computer networks,Computer Networks,
reference.publication_name,Expert Systems With Applications,Expert Systems with Applications,
reference.publication_name,IEEE/ACM Transactions on Networking,IEEE/ACM Transactions on Networking (TON),
reference.publication_name,ieee transactions on network service management,IEEE Transactions on Network and Service Management,
reference.publication_name,IMC,ACM SIGCOMM conference on Internet measurement,
reference.publication_name,Internet Measurement Conference,ACM SIGCOMM conference on Internet measurement,
reference.publication_name,"Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications","Proceedings of the 2005 conference on applications, technologies, architectures, and protocols for computer communications",
reference.publication_name,SIGCOMM,ACM SIGCOMM conference on Internet Measurement,
reference.bibtex.type,INPROCEEDINGS,inproceedings,
reference.curated_by,dferreira,"ferreira, d.",
data.datasets.name,abilene,Abilene,
data.datasets.name,geant,GÉANT,
data.datasets.name,Géant,GÉANT,
data.datasets.name,GÈANT,GÉANT,
data.datasets.name,genome_campus,Genome campus,
data.datasets.name,genome_campus-2004,Genome campus,
data.datasets.name,darpa-ideval-1999,DARPA IDeval,
data.datasets.name,darpa_IDeval-2000,DARPA IDeval,
data.datasets.name,mawi-2015,MAWI,
data.datasets.name,mawi wide,WIDE,
data.datasets.name,wide,WIDE,
data.datasets.dataset_name,abilene,Abilene,v2.0 field name; see if it's possible to update version
data.datasets.dataset_name,geant,GÉANT,v2.0 field name
data.datasets.dataset_name,Géant,GÉANT,v2.0 field name
data.datasets.dataset_name,GÈANT,GÉANT,v2.0 field name
data.datasets.dataset_name,genome_campus,Genome campus,v2.0 field name
data.datasets.dataset_name,genome_campus-2004,Genome campus,v2.0 field name
data.datasets.dataset_name,darpa-ideval-1999,DARPA IDeval,v2.0 field name
data.datasets.dataset_name,darpa_IDeval-2000,DARPA IDeval,v2.0 field name
data.datasets.dataset_name,mawi-2015,MAWI,v2.0 field name
data.datasets.dataset_name,mawi wide,WIDE,v2.0 field name
data.datasets.dataset_name,wide,WIDE,v2.0 field name
data.datasets.types,_tls,tls,
data.datasets.covered_period,_months,months,
preprocessing.tools.name,argus,Argus,
preprocessing.tools.name,Netflow,NetFlow,
preprocessing.tools.name,NetMate,NETMATE,
preprocessing.tools.name,Netmate,NETMATE,
preprocessing.tools.name,_own_csharp_tools,own_csharp_tools,
preprocessing.tools.name,_own_java_scripts,own_java_scripts,
preprocessing.tools.name,_own_matlab_scripts,own_matlab_scripts,
preprocessing.tools.name,_own_perl_scripts,own_perl_scripts,
preprocessing.tools.name,_own_r_scripts,own_r_scripts,
preprocessing.tools.name,snort,Snort,
preprocessing.tools.name,weka,WEKA,
preprocessing.tools.name,Weka,WEKA,
preprocessing.tools.tool,argus,Argus,v2.0 field name
preprocessing.tools.tool,Netflow,NetFlow,v2.0 field name
preprocessing.tools.tool,NetMate,NETMATE,v2.0 field name
preprocessing.tools.tool,Netmate,NETMATE,v2.0 field name
preprocessing.tools.tool,_own_csharp_tools,own_csharp_tools,v2.0 field name
preprocessing.tools.tool,_own_java_scripts,own_java_scripts,v2.0 field name
preprocessing.tools.tool,_own_matlab_scripts,own_matlab_scripts,v2.0 field name
preprocessing.tools.tool,_own_perl_scripts,own_perl_scripts,v2.0 field name
preprocessing.tools.tool,_own_r_scripts,own_r_scripts,v2.0 field name
preprocessing.tools.tool,snort,Snort,v2.0 field name
preprocessing.tools.tool,weka,WEKA,v2.0 field name
preprocessing.tools.tool,Weka,WEKA,v2.0 field name
preprocessing.transformations,__histograms,_histograms,
preprocessing.transformations,__feature_reduction,_feature_reduction,
preprocessing.final_data_format,__numerical_series,_numerical_series,
preprocessing.feature_selections.name,CFS,Correlation-based Feature Selection (CFS),
preprocessing.feature_selections.name,correlation based,Correlation-based Feature Selection (CFS),
preprocessing.feature_selections.name,correlation-based feature selection,Correlation-based Feature Selection (CFS),
preprocessing.feature_selections.name,Correlation-based Feature Subset Selection,Correlation-based Feature Selection (CFS),
preprocessing.feature_selections.name,Correlation-based Filter,Correlation-based Feature Selection (CFS),
preprocessing.feature_selections.name,consistency based,Consistency-based Feature Selection,
preprocessing.feature_selections.name,consistency-based feature selection,Consistency-based Feature Selection,
preprocessing.feature_selections.name,Consistency-based subset evaluation,Consistency-based Feature Selection,
preprocessing.feature_selections.name,FCBF (information gain-based),Fast Correlation-Based Filter (FCBF),
preprocessing.feature_selections.name,feature selection with fast correlation based-filter (FCBF) ,Fast Correlation-Based Filter (FCBF),
preprocessing.feature_selections.name,PCA,Principal Component Analysis (PCA),
preprocessing.feature_selections.classifier,naive_bayes,Naive Bayes,
preprocessing.feature_selections.classifier,support vector machine,Support Vector Machine (SVM),
preprocessing.feature_selections.classifier,SVM,Support Vector Machine (SVM),
analysis_method.tools.name,LS-SVM lab,LS-SVMlab,
analysis_method.tools.name,Weka,WEKA,
analysis_method.tools.name,weka,WEKA,
analysis_method.tools.tool,weka,WEKA,v2.0 field name
analysis_method.algorithms.name,Adaboost with Decision Stumps,AdaBoost,
analysis_method.algorithms.name,Bayes Net,Bayesian Network,
analysis_method.algorithms.name,bayes_network,Bayesian Network,
analysis_method.algorithms.name,Bayes Network,Bayesian Network,
analysis_method.algorithms.name,CS4.5,C4.5,typo; move to decision tree
analysis_method.algorithms.name,decision tree,Decision Tree,
analysis_method.algorithms.name,Decision tree,Decision Tree,
analysis_method.algorithms.name,decision_tree,Decision Tree,
analysis_method.algorithms.name,hidden markov model,Hidden Markov Model (HMM),
analysis_method.algorithms.name,Hidden Markov Model,Hidden Markov Model (HMM),
analysis_method.algorithms.name,HMM,Hidden Markov Model (HMM),
analysis_method.algorithms.name,hierarchical clustering,Hierarchical Agglomerative Clustering,actual algorithm used in paper is agglomerative
analysis_method.algorithms.name,hierarchical agglomerative,Hierarchical Agglomerative Clustering,
analysis_method.algorithms.name,k-means,K-means,
analysis_method.algorithms.name,K-Means,K-means,
analysis_method.algorithms.name,k-medoids,K-medoids,
analysis_method.algorithms.name,k-Nearest-Neighbors,K-nearest Neighbors (KNN),
analysis_method.algorithms.name,k nearest neighbors,K-nearest Neighbors (KNN),
analysis_method.algorithms.name,KNN,K-nearest Neighbors (KNN),
analysis_method.algorithms.name,logistic regression,Logistic Regression,
analysis_method.algorithms.name,mlp,Multilayer Perceptron (MLP),
analysis_method.algorithms.name,multi-class support vector machine,Support Vector Machine (SVM),
analysis_method.algorithms.name,Naïve Bayes,Naive Bayes,
analysis_method.algorithms.name,naive_bayes,Naive Bayes,
analysis_method.algorithms.name,Neural Nets,Neural Network,
analysis_method.algorithms.name,neural_network,Neural Network,
analysis_method.algorithms.name,neural network,Neural Network,
analysis_method.algorithms.name,random forest,Random Forest,
analysis_method.algorithms.name,Random forest,Random Forest,
analysis_method.algorithms.name,support vector machine,Support Vector Machine (SVM),
analysis_method.algorithms.name,Support Vector Machine,Support Vector Machine (SVM),
analysis_method.algorithms.name,svm,Support Vector Machine (SVM),
analysis_method.algorithms.name,SVM,Support Vector Machine (SVM),
analysis_method.algorithms.name,threshold comparison,Threshold based rule,
analysis_method.algorithms.name,Naïve Bayes with FCBF prefiltering,Naive Bayes,extra information is in subname
analysis_method.algorithms.name,Naïve Bayes with kernel density estimation,Naive Bayes,extra information is in subname
analysis_method.algorithms.name,Naïve Bayes with kernel density estimation and FCBF prefiltering,Naive Bayes,extra information is in subname
analysis_method.algorithms.metric,__normalized_euclidean,euclidean,was this here for any particular reason? remove duplicate
analysis_method.algorithms.tools.name,LS-SVM lab,LS-SVMlab,
analysis_method.algorithms.tools.name,Weka,WEKA,
analysis_method.algorithms.tools.name,weka,WEKA,
analysis_method.algorithms.tools.tool,weka,WEKA,v2.0 field name
analysis_method.algorithms.tools.name,own_matlab_tools,own_matlab_scripts,
analysis_method.algorithms.tools.name,sckit-learn,scikit-learn,
evaluation.methods.name,accuracy,Accuracy,
evaluation.methods.name,Accuracy of Classification,Accuracy,
evaluation.methods.name,correctly classified,Accuracy,
evaluation.methods.name,Clustering Accuracy,Accuracy,
evaluation.methods.name,accuracy by bytes,Byte Accuracy,
evaluation.methods.name,byte accuracy,Byte Accuracy,
evaluation.methods.name,detection rate,Detection Rate,
evaluation.methods.name,error_rate,Error Rate,
evaluation.methods.name,F-1,F1-Score,
evaluation.methods.name,F1,F1-Score,
evaluation.methods.name,f-1,F1-Score,
evaluation.methods.name,F1-score,F1-Score,
evaluation.methods.name,F1 Scores,F1-Score,
evaluation.methods.name,false alarm,False Positive Rate,
evaluation.methods.name,False alarm rate,False Positive Rate,
evaluation.methods.name,False positive rate,False Positive Rate,
evaluation.methods.name,false positive rate,False Positive Rate,
evaluation.methods.name,FP,False Positive Rate,
evaluation.methods.name,FPR,False Positive Rate,
evaluation.methods.name,FNR,False Negative Rate,
evaluation.methods.name,precision,Precision,
evaluation.methods.name,precission,Precision,
evaluation.methods.name,recall,Recall,
evaluation.methods.name,AUC,Area Under Curve/Receiver Operating Characteristics,
evaluation.methods.name,ROC,Area Under Curve/Receiver Operating Characteristics,
evaluation.methods.name,Receiver Operating Characteristics,Area Under Curve/Receiver Operating Characteristics,
evaluation.methods.name,ROC curve,Area Under Curve/Receiver Operating Characteristics,
evaluation.methods.name,ROC Curve,Area Under Curve/Receiver Operating Characteristics,
evaluation.methods.name,training time,Training Time,
evaluation.methods.name,Training Tme,Training Time,
evaluation.methods.name,Training time,Training Time,
evaluation.methods.name,TP,True Positive Rate,
evaluation.methods.name,true positive rate,True Positive Rate,
evaluation.methods.name,True positive rate,True Positive Rate,
evaluation.methods.name,true positives,True Positive Rate,
evaluation.methods.name,time of execution,Time,
evaluation.methods.name,time,Time,
evaluation.methods.metrics,_false_negative_rate,false_negative_rate,needs to be added to specification; check incomplete_confusion_matrix
evaluation.methods.metrics,_fnr,false_negative_rate,needs to be added to specification
evaluation.methods.metrics,_false_positive_rate,false_positive_rate,needs to be added to specification
evaluation.methods.metrics,_fpr,false_positive_rate,needs to be added to specification
evaluation.methods.metrics,__memory,_memory,
evaluation.methods.metrics,__mili_percentage_fdr,_mili_percentage_fdr,
evaluation.methods.metrics,_recall,recall,
result.main_goal,_traffic_rate_prediction,network_properties_monitoring,
result.subgoals,__feature_set_testing,_feature_set_testing,
result.subgoals,__hypothesis_testing,_hypothesis_testing,
result.claimed_improvements,__algorithm_testing,_algorithm_testing,
result.claimed_improvements,__focus_on_enterprise_network_traffic,_focus_on_enterprise_network_traffic,
result.claimed_improvements,__generic_solution,_generic_solution,
result.claimed_improvements,__method_viable,_method_viable,
result.claimed_improvements,__suitable_for_encrypted_traffic,_suitable_for_encrypted_traffic,
evaluation.methods.metrics,roc/uac,roc/auc,
Since we're changing some things already, it's a good time to synchronize the free text fields that are frequent to common names. In this PR exactly that was done, plus some changes to make the papers v2.2 compatible:
The checks will not pass for now, as the specification is yet to be updated. This should only be merged after the checks pass.
The full list of replacements is here (in the form of csv file):