issues
search
chrismattmann
/
etllib
This is the ETL lib package. It provides an API to munge and prepare JSON, TSV and other data using Apache Tika and JSON parsing/loading for ETL via Apache OODT (or other libs) into Apache Solr.
16
stars
35
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Support for Python 3.7 (See commit comments for detailed changes)
#69
Anthonyive
closed
1 year ago
1
Anthony branch
#68
Anthonyive
closed
3 years ago
0
Support for Python == 3.7
#67
Anthonyive
closed
3 years ago
0
For local varible 'val' referenced before assignment problems
#66
YeleiWu
closed
2 years ago
0
added check to ensure conversion happens when encoding is not specified
#65
adityasundaram
closed
6 years ago
2
Expose threshold for near duplicates. This closes #63.
#64
chrismattmann
closed
8 years ago
0
Make threshold for tsvtojson near duplicates configurable from command line
#63
chrismattmann
closed
8 years ago
0
Rename imagesimilarity to similarity. Remove buildout. This closes #58.
#62
chrismattmann
closed
8 years ago
0
upgrade APIs to latest Tika. this closes #57. accept 2 digit dates this closes #59. allow translatejson to cache results in REDIS like DB this closes #60.
#61
chrismattmann
closed
8 years ago
0
Allow translatejson to cache results in REDIS-like file DB
#60
chrismattmann
closed
8 years ago
0
Clean up date formatting bug (accept dates with 2 digit numbers)
#59
chrismattmann
closed
8 years ago
0
Publish to pip and remove buildout
#58
chrismattmann
closed
8 years ago
0
Upgrade APIs to latest version of Tika
#57
chrismattmann
closed
8 years ago
0
Installation: python bootstrap.py fails
#56
harsham05
closed
8 years ago
15
fix missing comma.
#55
chrismattmann
closed
9 years ago
0
change help message
#54
dongnizh
closed
9 years ago
0
change for bunch of files
#53
dongnizh
closed
9 years ago
1
Update the README to include imagesimilarity.py
#52
chrismattmann
closed
9 years ago
2
delete similarity and clusterscores script
#51
dongnizh
closed
9 years ago
3
merge image similarity into ETLlib
#50
dongnizh
closed
9 years ago
2
Format regardless
#49
dongnizh
closed
9 years ago
1
merge tika-image-similarity into ETLlib
#48
dongnizh
closed
9 years ago
3
fixed can be applied for value-based similarity
#47
dongnizh
closed
9 years ago
1
Up to date with latest updates
#46
dongnizh
closed
9 years ago
1
fixed encoding problem
#45
dongnizh
closed
9 years ago
1
fix clusterscores cmd line.
#44
chrismattmann
closed
9 years ago
0
-added -c option
#43
dongnizh
closed
9 years ago
2
CS572HW1
#42
pk-pranshu
closed
9 years ago
2
572_HW1_Extra_credit
#41
mengxian-li
closed
9 years ago
1
integrated into similarity.py and cluster-scores.py
#40
dongnizh
closed
9 years ago
1
changed -c option and delete write-in-file function
#39
dongnizh
closed
9 years ago
1
Img similarity
#38
dongnizh
closed
9 years ago
1
bootstrap.py failing to complete
#37
archerrbgh
closed
9 years ago
5
Error when running buildout
#36
dongnizh
closed
9 years ago
2
First steps towards #33 - refactor encoding detection into etllib.py and leverage in tsvtojson.py
#35
chrismattmann
closed
9 years ago
1
Update poster.py
#34
smritish
closed
9 years ago
1
repackageandpost gives UnicodeDecodeError when the input JSON file has different encoding
#33
abhinandkr
closed
9 years ago
7
adding date format checks
#32
georgejose
closed
9 years ago
1
tsvtojson fails when the output JSON file already exists
#31
abhinandkr
closed
9 years ago
4
Solr url for poster
#30
smritish
closed
9 years ago
10
adding useful commandline error msgs
#29
georgejose
closed
9 years ago
1
unknown encoding error, because of wrong file definition
#28
srimadha
closed
9 years ago
2
fix for dupliate output for each unique record
#27
gsrika
closed
9 years ago
1
unknown encoding: binary:0:Message: unknown ecoding error while running tsvtojson
#26
srimadha
closed
9 years ago
4
possible bug - tsvtojson prints each unique record twice??
#25
gsrika
closed
9 years ago
2
Update to New Package: Error running the tsvtojson
#24
siddharthasandhu
closed
9 years ago
5
Pythonic implementation of tsvtojson
#23
siddharthasandhu
closed
9 years ago
2
command g++ failed at the last two steps installing jcc
#22
sun736
closed
9 years ago
5
updating README to address missing libmagic in darwin
#21
georgejose
closed
9 years ago
1
libmagic missing in mac osx
#20
georgejose
closed
9 years ago
2
Next