RD-Connect / vcfLoader

From gVCFS to ElasticSearch through SparkSQL-Hive-Parquet
2 stars 1 forks source link

streaming #19

Closed dpiscia closed 8 years ago

dpiscia commented 8 years ago
dpiscia commented 8 years ago

linked to branch streaming. it works but there still one problem, the RDDstream contains all the new files, so it has troubles to assign the data to the corresponding dynamic partition (chromosome).

The chromosome can be filtered from data, but for the sample name is more difficult

dpiscia commented 8 years ago

I'll close it and open a more specific