The things we have to think.

blueworrybear commented 12 years ago

Guys:

After reading some books and discussing with some people, I found there are some misunderstood about data mining.

In fact, it is a tremendous problem that we still don't know the framework of the project. In other world, we don't know

what should do exactly. Here are some questions below.

Since we want to have some categories of magazine, how do we know what categories should we have?
- This question is about the issue that where should we apply data mining?
Assume that we know what categories we have, how do we collect the articles that relate to the category?
If we have collected lots of articles that relate to the given category, how do we build up a magazine from the articles?
- In other world, what is the structure of a magazine?

The questions above will help us determine what are the methodologies should we use. After answering the questions,

we are able to design our framework.

Actually, I have tried to image what the framework looks like. The several steps in beginning seem to have no relation

between data mining. What we have to do is just collecting data and do some calculating on TF*IDF. At last, try to think

clearly what should we do, but not just thinking what we want in the end. The process to achieve our goal should be

figured out clearly.

kevinLoe commented 12 years ago

第3個問題算是非常難的，我之前就有說過，至於category的話，可不可以先只有一個項目，還是說一定要全部都定義好因為我覺得先做出一項(EX:美食)，在增加比較簡單。另外今天下午我去圖書館找了很久，data mining的書(中文)只有1本，而且超級舊，沒有JDM，hadoop更不用說，我發現好破爛的圖書館= =，之後又去水木書院，除了JDM其他我都有看到，還有一本叫"網路爬蟲"，翻一翻感覺好像有用，不知道是否有關。

blueworrybear commented 12 years ago

Response

若是這個專題只實作一個類別的話，代表了我們的專題是聚焦在“如何產生一個結構完整的雜誌”。

不曉得我這樣說能不能讓各位理解？

換句話說，問題在於我們想要研究的重點是什麼？

是如何收集和某個主題相關的文章，還是如何把一堆和某主題相關的文章編輯成有組織的雜誌？

當然我們的產出必需要是“一本雜誌”，但我們在這個專題期待的重點是什麼必需想清楚。

收集相關文章和組織文章可能都是我們要實作的項目，但在我們的framework中，那一個要佔去我們做多的時間？

若是只針對一個主題來做我們的專題，就意味著我們展示成果將會被聚焦在我們組成雜誌的邏輯好不好。

那若是我們針對很多不同的主題，那焦點就會是我們收集的文章和主題有沒有相關。

重點是我們想要得到什麼結果？我們有那些想達成的目標？

這裡所謂的目標不是要討論我們的APP看起來要如何如何。

而是我們有那些要研究的功能？

舉例來說，收集和“某個”主題有關的文章就是一個需要研究的領域。

這個目標或許就是一篇論文等級的成果。

換句話說，若是我們把上述的這個目標做出來，意味了我們可以有很多主題的相關文章。

但是完成這個目標並不代表我們能組織一個完整的雜誌。

所以組織文章便會是在framework中的下一個目標。

總結來說，我想表達的是，討論“能不能只針對一項來做”是不夠明確的。

重點是我們想在“這一項”中做什麼？

單純的分類文章嗎？還是要強調文章間的邏輯？又或者是⋯⋯？

陳宜欣教授上次問我的問題我現在才搞清楚。

教授問的是我們倒底期待雜誌要有什麼樣的內容和架構。

若是只要求內容和我們所訂出來的主題有關係的話，或許就不在Data Mining的範圍之中。

當然，牽扯到的問題還有一個，我們資料的來源？

教授這樣問的原因是，資料取得的方法不同，那要使用的技術也會不同。

若是我們收集文章的方法是用和主題相關的關鍵字爬得的話，

那還有必要再分類一次？

那若是我們的Data Set 是純粹的一堆文章，或許我們才需要Data Mining來分類這些文章。

其實和教授談過後，會發現我們訂出來的目標都還不夠明確。

教授無法從我們現在訂出的目標來回答什麼樣的技術適合我們。

要讓教授理解我們的問題，必需要讓教授知道我們的步驟，或者說架構。

要讓教授知道我們每一個階段的小目標是什麼，再來決定什麼方法適合我們。

結論就是，我們必需要討論我們想要的每一個細節。

而要能規劃出細節，必須有一定的背景知識來做想像。

這樣才有辦法讓教授修正。

subsevenx2001 commented 12 years ago

2012/4/10 blueworrybear < reply@reply.github.com

Guys:

After reading some books and discussing with some people, I found there are some misunderstood about data mining.

In fact, it is a tremendous problem that we still don't know the framework of the project. In other world, we don't know what should

do exactly. Here are some questions below.

Since we want to have some categories of magazine, how do we know what categories should we have?

Reply to this email directly or view it on GitHub: https://github.com/blueworrybear/DolorMag/issues/6

Got it , I'll think about it

blueworrybear / DolorMag

The things we have to think. #6

Response