fossasia / labs.fossasia.org

Projects Website for FOSSASIA http://labs.fossasia.org
1.67k stars 247 forks source link

News-source-finder with loklak #175

Closed niccokunzmann closed 5 years ago

niccokunzmann commented 7 years ago

A lot of papers and news websites copy news from the same sources. The questions arise:

We have already have the search engine loklak in place. It can be used as a source of information. Can we use it to answer such questions?

abishekvashok commented 7 years ago

The one which comes first looking at the timestamp can be concluded as the orginal source, on copying of this news and making it available again on loklak we could query against the same news and see the duplicate ones...

wongalvis commented 7 years ago

Hi, I'm interested in this project for GSoC. I have a web dev background with expertise in Javascript (jQuery and Angular), HTML/CSS and UI/UX design. I agree this is a huge concern on social platforms, I would be eager to look into existing solutions implemented to tackle these issues and perhaps try to come up with some new ideas specific to this tool. Please let me know how can I get further technical details for this project. Thank you.

niccokunzmann commented 7 years ago

@wongalvis Sure, I should have put links there in the first place.

There is the loklak organization: https://github.com/loklak/ It has several repositories:

What do you think?

wongalvis commented 7 years ago

I'm a little bit confused with the "sources" in the first point, are you referring to the source of tweets or external data? And for the third point do you mean that several API calls can be fired in parallel?

All in all this sounds really interesting!

kamsuri commented 7 years ago

Hi @niccokunzmann , I'm interested in this project for GSoC'17. Please let me know how can I start contributing to this project. Thank you

bharatkashyap commented 7 years ago

Hi @niccokunzmann . Are you looking to build this tool as something that works on social media feeds? (As a Chrome Extension, for instance, that can point out the sources behind news articles in one's Twitter feed)

niccokunzmann commented 7 years ago

Hi @bharatkashyap @kamsuri @wongalvis,

Now, we are more than 4 People interested in this idea. I have formulated it as a research idea. The question for me is: can we detect hidden copies and word trends and the people behind them.

As I understood, loklak can be modified to query all sorts of sources, search through them and provide an open interface. So, I see these possibilities to work on this question:

Why I chose loklak in the description:

Which I think is what we need to answer such a question.

What I think could be the next steps:

So the question to you is: What are you interested in and what would you like to do?

wongalvis commented 7 years ago

@niccokunzmann Social/current-affairs related applications is one of the topics in software development that I'm most interested in. There have been existing implementations based on official Twitter api to trace old tweets efficiently, I am thinking of using similar methods to calculate the likelihood of a particular tweet to be the original source of the news.

After gathering sufficient data we could move on to analyze which ones would be regular "copycats" and which ones are authoritative sources. Of course this could also be done starting with known sources of news such as New York Times.

Then we can model it based on patterns found in common trustable sources in contrast to sources that regularly copy news.

While official Twitter API has plenty of limitations, loklak would give us the advantage to freely analyze the data.

I am interested and would be able to handle all of the data gathering, news mining and modelling parts, I'm aware that the latter ones require more research so more reading/study would be required beforehand. Nonetheless I am experienced in Javascript and C++ so using the API to gather data and running simple analyze wouldn't be too hard.

It depends on the timeframe so it would be better if the details of what has to be done could be settled.

Cheers!

kamsuri commented 7 years ago

@niccokunzmann According to me loklak is a good choice for collecting data as it will be hard to implement Twitter API as it has a lot of limitations. I have gone through Loklak server's details and have concluded that it will be easier to extend it to our requirements.
Then comes analysis of the collected data. For this i think i have to dig more to come to final conclusions. But it will not be a difficult deal as i have worked with probabilistic analysis earlier. Designing a model for finding trusted sources will be based on a few perspectives like the timestamp, past records of that source in our model, word analysis, etc. This list will be finalized after analyzing a few more aspects of the problem. I also had an idea in my mind if we could use something of sort reverse image analyses like as in google. I am interested in this project and would like to contribute to it. I would like to start working on model designing as it would be a major and challenging part of this project and later i think we can extend loklak server to meet our requirements. So i think before starting with the project we first need to frame its design properly. Looking forward to contribute to this project.