cncf / cluster

🖥🖥🖥🖥CNCF Community Cluster
https://cncf.io/cluster
154 stars 38 forks source link

TF #50

Closed tfmolch closed 6 years ago

tfmolch commented 6 years ago

If you are interested in filing a request for access to the CNCF CIL, please fill out the details below.

If you are just filing an issue, ignore/delete those fields and file your issue.

First Name

Thomas

Last Name

Felder

Email

tf@molch.at

Company/Organization

N/A

Job Title

dev

Project Title

analyticssandbox

Briefly describe the project

data-mining / social sentiment analysis experiements / container orchestration etc

Which members of the CNCF community and/or end-users would benefit from your work?

data scientists

Is the code that you’re going to run 100% open source? If so, what is the URL or URLs where it is located?

yes – https://github.com/tfmolch

What kind of machines and how many do you expect to use (see: https://www.packet.net/bare-metal/)?

3 midsize VMs

What OS and networking are you planning to use (see: https://help.packet.net/technical/infrastructure/supported-operating-systems)?

debian

Please state your contributions to the open source community and any other relevant initiatives

none worth mentioning yet – I'm a self-taught programmer!

How will this testing advance cloud native computing (specifically containerization, orchestration, microservices or some combination).

well at the very least I'll be testing/bug reporting

Any other relevant details we should know about while preparing the infrastructure?

SAN storage shared between the nodes would be a plus!

dankohn commented 6 years ago

Sorry, we need a little more specificity on exactly what repos you're planning to run and what you want to do with them.

-- Dan Kohn dan@linuxfoundation.org Executive Director, Cloud Native Computing Foundation https://www.cncf.io +1-415-233-1000 https://www.dankohn.com

On Wed, Nov 15, 2017 at 3:08 AM, tfmolch notifications@github.com wrote:

If you are interested in filing a request for access to the CNCF CIL, please fill out the details below.

If you are just filing an issue, ignore/delete those fields and file your issue. First Name

Thomas Last Name

Felder Email

tf@molch.at Company/Organization

N/A Job Title

dev Project Title

analyticssandbox Briefly describe the project

data-mining / social sentiment analysis experiements / container orchestration etc Which members of the CNCF community and/or end-users would benefit from your work?

data scientists Is the code that you’re going to run 100% open source? If so, what is the URL or URLs where it is located?

yes – https://github.com/tfmolch What kind of machines and how many do you expect to use (see: https://www.packet.net/bare-metal/)?

3 midsize VMs What OS and networking are you planning to use (see: https://help.packet.net/technical/infrastructure/ supported-operating-systems)?

debian Please state your contributions to the open source community and any other relevant initiatives

none worth mentioning yet – I'm a self-taught programmer! How will this testing advance cloud native computing (specifically containerization, orchestration, microservices or some combination).

well at the very least I'll be testing/bug reporting Any other relevant details we should know about while preparing the infrastructure?

SAN storage shared between the nodes would be a plus!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cncf/cluster/issues/50, or mute the thread https://github.com/notifications/unsubscribe-auth/AC8MBqjBEJVM7ASiC8eWR2pbAaueu3Kcks5s2pv5gaJpZM4Qeirq .

tfmolch commented 6 years ago

Certainly. The plan is to build a user-friendly way of accessing realtime sentiment data. It should work in more than one language and use whatever feed-forward net is state-of-the-art within the given domain. Results will be generated from a combination of pre-mined datasets (for common search-terms) combined with "best-effort"/democratic retargeting. User input combined with another net should deliver fairly reliable trend detection. In other words whatever users search for, tweet or post about retargets a python (possibly scala) mining-cluster on spark. It should allow the software to deliver indicative results for any given query, and more exact results for trends. I should add – there will likely be some sort of commercial api to hook up fintech platforms (Eikon, homebrew, what-have-you...). I'm aware that such a solution is feasibly implementable using existing "toolchains" on IBM cloud for instance, but even with the present anything-you-want-as-a-service offerings, sentiment data remains largely unavailable to non-technical people. It's very early days still so I can't provide you with a repository URL. I'd use the VMs to see whether our current implementation is (at all) scalable. Many things above are nowhere near complete, subject to change, ... well, evolving along with the industry.

dankohn commented 6 years ago

Unfortunately, I need to turn this down for now. The CIL is meant to test scalability of actual working code, not be the primary development machines for new software creation.

Please get to the point of having an MVP in a GitHub repo that works with smaller amounts of data, and then open a new issue asking for CIL resources to demonstrate that your MVP scales.

tfmolch commented 6 years ago

I miscommunicated apparently. There is actual working code. It's just not multi-index; multi-language multi-analyzer – it's running with a naive-bayes analyzer, english OK, german POC at best. The frontend needs css fixes and the backend is spread across a number of machines running slightly different setups – while the indexes are structurally sound, there's no unified codebase. I've found python code rather difficult to compile into binaries as well.

In any case, you've encouraged me to do a round of cleanup before further exploratory ml-related coding.

dankohn commented 6 years ago

Sounds good. Please come back with a single repo (which can certainly pull from other repos) that includes a Readme describing what it's doing, that it works well on a single machine, etc. And then we'd be happy to assign you resources to enable you to do scale testing.