issues
search
b-cube
/
semantics-preprocessing
initial text preprocessors for the triplestore and feature classification
Other
2
stars
3
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
let's get on with it
#90
roomthily
closed
9 years ago
0
Let's just dump it back in
#89
roomthily
closed
9 years ago
0
prep work for the module balkanization
#88
roomthily
closed
9 years ago
0
Bag of Words + Unicode Decode Unicode cruft returns
#87
roomthily
opened
9 years ago
3
New mimetypes for the list
#86
roomthily
opened
9 years ago
0
Add HDF to the identification set
#85
roomthily
closed
9 years ago
1
Normalize the null value handling in the JSON values
#84
roomthily
opened
9 years ago
0
Append common query params to OGC endpoints
#83
roomthily
closed
9 years ago
1
Add "actionable" kvp for endpoint URLs
#82
roomthily
closed
9 years ago
1
For failed XML parsing, tag the identity JSON as is_error
#81
roomthily
opened
9 years ago
0
Add identifiers for XSD and a couple of other unwanted response types
#80
roomthily
opened
9 years ago
0
Add identifier EXCLUDE configuration option for parsers
#79
roomthily
opened
9 years ago
0
Add language detection xpaths to configs
#78
roomthily
opened
9 years ago
0
Remove binary image data embedded in the XML
#77
roomthily
closed
9 years ago
1
Add some language element to the identify configuration
#76
roomthily
closed
9 years ago
1
Add a startswith flag for the identification (consider better handling all around)
#75
roomthily
closed
9 years ago
0
Finish compiling the mimetype "corpus"
#74
roomthily
closed
9 years ago
3
Add THREDDS has_dataset & has_metadata to identification config
#73
roomthily
closed
9 years ago
1
Add a Dublin Core identifier
#72
roomthily
closed
9 years ago
1
Split identifier configs into smaller chunks and update processing to load some subset
#71
roomthily
closed
9 years ago
1
Add some way of dealing with RDF if *not* a Dataset
#70
roomthily
opened
9 years ago
1
Properly identify RDF Dataset responses
#69
roomthily
closed
9 years ago
0
Deal with unescaped html in the XML responses
#68
roomthily
closed
9 years ago
0
Update thredds endpoints to return full path as URL
#67
roomthily
opened
9 years ago
5
Update keyword parsing to return a unique set
#66
roomthily
closed
9 years ago
0
Add text post-processing to luigi pipeline
#65
roomthily
closed
9 years ago
1
Create a generic router for the processors
#64
roomthily
closed
9 years ago
1
Normalize the THREDDS endpoint keys
#63
roomthily
closed
9 years ago
1
Add the nested dataset parsing for THREDDS catalogs
#62
roomthily
closed
9 years ago
4
Deal with whitespace issues in the text() values
#61
roomthily
closed
9 years ago
1
ISO/CSW confusion
#60
roomthily
closed
9 years ago
0
Remove keyword normalization from the parsers
#59
roomthily
closed
9 years ago
1
Add controlled vocab mapping for exceptions in OGC
#58
roomthily
opened
9 years ago
0
Bag of Words: ignore URNs?
#57
roomthily
opened
9 years ago
0
Classification: mimetype identification
#56
roomthily
opened
9 years ago
1
Bag of Words: mimetypes!
#55
roomthily
closed
9 years ago
1
Bag of Words: revise the cleanup to handle chars better
#54
roomthily
closed
9 years ago
1
Bag of Words: exclude stopwords
#53
roomthily
closed
9 years ago
1
Bag of Words: add POS tagging and POS filtering
#52
roomthily
closed
9 years ago
1
Add the "split keyword items" step to the parser pipeline for triples
#51
roomthily
closed
9 years ago
2
Store the response source URL in the JSON description
#50
roomthily
closed
9 years ago
1
Remove unicode escape cruft from identification strings
#49
roomthily
closed
9 years ago
2
Replace remainder tuples with dicts
#48
roomthily
closed
9 years ago
1
Missing some FGDC (coris noaa) identifications
#47
roomthily
closed
9 years ago
4
Exclude partial ISO records
#46
roomthily
closed
9 years ago
1
Hey, It's a DS record - fails to parse correctly
#45
roomthily
opened
9 years ago
1
Add the metadataStandardName to the ISO version blob
#44
roomthily
closed
9 years ago
1
XPath fails with our xpaths irrespective of version searches
#43
roomthily
closed
9 years ago
0
Add html encoded delimiters to the keyword normalization widgetry
#42
roomthily
closed
9 years ago
1
Add id check for THREDDS config (as error!)
#41
roomthily
closed
9 years ago
2
Next