Closed tthuem closed 7 years ago
Do you have an example where it does not fit the first so that the implementation can be tested against this entry?
@inproceedings{PTF+:SPLC16,
author = {Pfofe, Tristan and Th\"um, Thomas and Fenske, Wolfram and Schulze, Sandro and Schaefer, Ina},
title = {{Synchronizing Software Variants with VariantSync}},
booktitle = SPLC,
publisher = ACM,
address = NY,
year = 2016,
note = {To appear}
}
Is implemented through a run-time pattern so that the title must be included in the specific element. On the other side, this pattern makes the gathering weak against HTML changes.
Seems to work now.
Almost all entries are now considered as not found on Google Scholar. Many already found articles are now updated by -2, which does not make sense.
Actual I have this pattern:
/<div class="gs_r">.*The Coq Proof Assistant Reference Manual.*Cited by (\d*).*<\/div><\/div>.*/igu
, but I identified some problems with the new pattern at run-time:
1 and 2 could only be solved through removing the title again, but that would increase the false positives again. For 3 and 4 we may calculate an average or take only the biggest citation count. What do you think about it?
Please revert the changes for now, as it worked better before.
Then, I looked into wrong cases and it seems that it is a matter of different capitalization. As I said, this only makes sense, if we rely on edit distance and do not compare exactly.
Examples:
"SDTF:TOPLAS13";"Contracts for First-Class Classes";20;1462821710040;
"H:FMCO13";"The Abstract Behavioral Specification Language A Tutorial Introduction";11;1462822911497;
"AKS+:FOSD13";"Exploring Feature Interactions in the Wild The New Feature-interaction Challenge";12;1462824113296;
"BRN+:VaMoS13";"A Survey of Variability Modeling in Industrial Practice";117;1462825315393;
"SRA:GPCE13";"Family-Based Performance Measurement";10;1462826517458;
"GBC+:ESECFSE13";"Incrementally Synthesizing Controllers from Scenario-Based Product Line Specifications";14;1462827718995;
"GSCH:REJ13";"Features Meet Scenarios Modeling and Consistency-Checking Scenario-Based Product Line Specifications";8;1462828920680;
"ABT+:REJ13";"Evaluating Scenario-Based SPL Requirements Approaches The Case for Modularity, Stability and Expressiveness";6;1462830122547;
"HS:SC13";"Reusable Components for Lightweight Mechanisation of Programming Languages";2;1462831324311;
"MRG:GPCE13";"Investigating Preprocessor-Based Syntax Errors";19;1462832525507;
"MRKN:iFM13";"Compositional Verification of Software Product Lines";17;1462833727130;
"BDS13";"Compositional Type Checking of Delta-Oriented Software Product Lines";23;1462834928447;
results in
"SDTF:TOPLAS13";"Contracts for First-Class Classes";-2;1466613906166;
"H:FMCO13";"The Abstract Behavioral Specification Language A Tutorial Introduction";-2;1466615136092;
"AKS+:FOSD13";"Exploring Feature Interactions in the Wild The New Feature-interaction Challenge";-2;1466616364571;
"BRN+:VaMoS13";"A Survey of Variability Modeling in Industrial Practice";-2;1466617592389;
"SRA:GPCE13";"Family-Based Performance Measurement";-2;1466618821484;
"GBC+:ESECFSE13";"Incrementally Synthesizing Controllers from Scenario-Based Product Line Specifications";-2;1466620050393;
"GSCH:REJ13";"Features Meet Scenarios Modeling and Consistency-Checking Scenario-Based Product Line Specifications";-2;1466621279027;
"ABT+:REJ13";"Evaluating Scenario-Based SPL Requirements Approaches The Case for Modularity, Stability and Expressiveness";-2;1466622512765;
"HS:SC13";"Reusable Components for Lightweight Mechanisation of Programming Languages";-2;1466623739889;
"MRG:GPCE13";"Investigating Preprocessor-Based Syntax Errors";-2;1466624967417;
"MRKN:iFM13";"Compositional Verification of Software Product Lines";-2;1466626194573;
"BDS13";"Compositional Type Checking of Delta-Oriented Software Product Lines";-2;1466627422731;
I uploaded 1b2bbd50ce4fa6fa2189d5d01dd795adf7382b39 as a fix for this issue. I have implemented a search over all results and the best possible fit by the Levenshtein-Distance is taken. After more than 3 hours, no more -2 or other errors. The results are also similar or better than with the cited search only.
Looks better, but still wrong results for
Advanced Compiler Design and Implementation
Structure and Interpretation of Computer Programs
Proof-Carrying Code
´´´
The last is a tough one, as the title is actually wrong in Google Scholar. I can live with the last one, but one is wrong with the other two?
Was closed automatically, but problem is still there.
I improved the searching with another regular expression which filters the entries first before trying to match the citations. Please check if it works for you. You can change the levenshteinParameter too. In my tests with the above mentioned entries and also the normal list, it works pretty well with 10% word changes allowed.
Seems to work fine.
I would recomment to use the edit distance and check that it is smaller then titleLength/5. Otherwise, store -2 as the paper is essentially not found, then.