dxc-technology / OSSRank

A project for ranking and categorizing OSS projects
12 stars 7 forks source link

Moving MongoDB to bluemix or !!! #42

Open theroys opened 9 years ago

theroys commented 9 years ago

For the storing more projects in mongo we need much more space .Rick offered to us to get space in bluemix. I will contact him regarding this.

fsiddiqi commented 9 years ago

Now that we've received bluemix access, we still need help :-)

theroys commented 9 years ago

@rickkwilhelm .. Thanks for arranging the access.

@fsiddiqi I have investigated some possibilities in bluemix. They provide only one type of nosql(and document oriented ) db , that is cloudant .. which is based on couchdb.. They have quite good REST API access. To support multiple DB type .. we have to do a bit of rewriting( which is easier on data collection side as we only do add and update operation). On visualization we need to look for if all the queries we are doing are same way to implement,So it will be basically rewriting to have a data access layer based on db type, which can be switched by configuration .

theroys commented 9 years ago

Just to note down what we did hear from Bluemix presentation

  1. It is possible to bring external Runtime in Bluemix dedicated .. which possibly can be Mongo-db.. we need further investigation(Bluemix public seems to support it)...
  2. IBM Cloudant supports 1.6 TB of data space & 3k response / sec
rickkwilhelm commented 9 years ago

On the Bluemix presentation, they seemed a lot more supportive of Cloudant (vs MongoDB). Seems to support Cloudant as their native noSQL db.

theroys commented 9 years ago

Now ...trying to put things into perspective and thinking loud.. 1.OSSRank stores data json document in a noSql document DB. 2.It queries same data while visualization 3.It updates same data during collection & ranking

So it can be any Json centric Document noSql DB that supports normal & REST API access to the DB can suffice.

1.Both MongoDB & Cloudant supports project requirement.So the project is db agnostic other than requiring a noSQL DB. 2.IBM claimed they can provide 1.6 TB data storage for this and 3k Response/Second 3.On the implementation level for the project, data layer can be abstracted provide support for both Mongo & Cloudant 4.Bluemix's python runtime can also be used for some of our cron jobs or batch run.

So if we can get such a huge space ( @rickkwilhelm helps us :) ) and 3k resposne /second (which actually immediately excites me do some parallel processing while data load & update , query ) i find it very interesting .

@fsiddiqi .. what do you think about it !!

@rickkwilhelm .. would it be possible a big data storage for Cloudant and a python runtime in dedicated Bluemix environment for CSC

tjmcs commented 9 years ago

@theroys; the biggest issue I could see would be cost. I shared the pricing model I found online with @fsiddiqi and @rickkwilhelm earlier, but here are the basics from that pricing model when it comes to their NoSQL (Cloudant) offering:

Note that a Heavy API call in this sense is any call that updates, inserts, or deletes data from the database. A Light API call is any call that only reads data.

So the costs could be quite significant just for the storage if you're wanting 1.6TB of storage; by my math that would be a cost per month of $1,600.00 (assuming 1000GB in a TB and assuming that the first 20GB of storage used are a rounding error if you're planning on using TB of storage). I'm not sure if the per-transaction rates would add to that cost significantly, but I'm also not sure who would foot a $1,600.00 per month fee; the fees I charge to my own cost center total about $75.00 per month right now, and we're only paying $200.00 per month for the CSC organization on GitHub.

Thoughts from others on this thread?

theroys commented 9 years ago

@tjmcs ..thanks for sharing the pricing model..it is surely the biggest issue.. i would say 10gb -20 gb of storage will suffice for start .. however Heavy API call even for 10000 documents/Projects (e.g.daily twitter data etc. ) doing daily updates adds up.

tjmcs commented 9 years ago

@theroys; agree completely. We may have to rethink how we deploy that database server based on those sorts of costs...