Jane0901 / Machine-Learning

0 stars 2 forks source link

Discussion about Python on RStudio #19

Open PoMingChen opened 5 years ago

PoMingChen commented 5 years ago

CH3

  1. tensor是什麼意思?什麼意思?
  2. Dense layer跟所謂的機率分配有關嗎?

    suppose to be! If you are doing a multi-classes classfication the final output will be layout.Dense(N, ....). the N result has its own probability distribution, and the sum of them is 1

3.3

Mac可以用其GPU建立自己的local workstation嗎?

3.4

The Help in RStudio still work for certain Python function?

configure and complie, are they different in ML field?

what for the validation set (有點忘了,先記著)

validation set 就是訓練模型過程中,用來檢測模型表現的那些樣本(事先從traning data再分出一定比例),若樣本不夠大的話,會用K-Fold Validation來變通,最後將結果進行平均,作為最後模型訓練結果的觀察。

x_train[10000:] 中括號內是怎麼樣一個概念!?

Later on Ch7

何謂functional API?

PoMingChen commented 5 years ago

CH3

3/26 討論內容

  1. Python的list,相當是R的vector

  2. 搞清楚One-hot encode概念(section6.1)

  3. 激活函數activation functions,負責為神經網路引入非線性特徵。看到不知道那個函數分配的話,就是google一下看看。

  4. 注意model.train & model.evaluate的差別

model evaluate 主要是用繪圖或是觀察數據變化(loss function等)來知道模型訓練過程,逐步的變化。找到最適的訓練次數,避免過度配適。但是train的部分主要包含了pre-processing以及compile的設計

  1. 建議參考Keras的cheatsheet,Define compile fit evaulate predict,中間三個步驟用Python,其他都回到R,做資料處理。

潛在問題

  1. (3.5路透社新聞分類case) Listing3.16上方

The best loss function to use in this case is categorical_crossentropy. It measures the distance between two probability distributions: here, between the probability distribution output by the network and the true distribution of the labels. By minimizingthe distance between these two distributions, you train the network to output something as close as possible to the true labels.

這邊是指透過模型去訓練讓我預測出來的46類機率分配值,與真實training資料擁有的label的真實分佈達到最接近。我的問題是,模型預測出來是46類新聞的各式機率,e.g. (政治,體育,經濟)= (0.03,0.05,0.07...) ,可是我的label不見得是所謂的(政治、體育、經濟),這樣要怎麼計算?

  1. Listing3.28 model = build_model() 有問題?

  2. evaluation metrics 用途是什麼?(classfication : accuracy, regression : mae)

PoMingChen commented 5 years ago

Ch4

  1. the summary of 4.1 : Minimal batch or batch (batch意思是?)

A small set of samples (typically between 8 and 128)that are processed simultaneously by the model. The number of samples isoften a power of 2, to facilitate memory allocation on GPU. When training, amini-batch is used to compute a single gradient-descent update applied tothe weights of the model.

  1. 4.2.1 simple outvalidation(P.122)

if different random shuffling rounds of the data before splitting end up yielding very different measures of model performance, then you’re having this issue

what does the shuffle mean? (是否是接近切割的意思?)

  1. 4.2.1 K-Fold validation(P.122)

Like hold-out validation, this method doesn’t exempt you from using a distinct validation set for model calibration

what does the exempt mean?

PoMingChen commented 5 years ago

4.3.1 Handling the missing value(p.125)

You may sometimes have missing values in your data. For instance, in the house-price example, the first feature (the column of index 0 in the data) was the per capita crimerate. What if this feature wasn’t available for all samples? You’d then have missing values in the training or test data.

In general, with neural networks, it’s safe to input missing values as 0, with the condition that 0 isn’t already a meaningful value. The network will learn from exposure to the data that the value 0 means missing data and will start ignoring the value.

這邊重點是,tensor(張量),要進入到深度學習,勢必要都是一個數字(浮點數),因此NA是進不去的。那如果所有的樣本資料(以數字表達e.g. with one-hot encoding)都沒有0的話,那其實可以以0來代表他。同理如果所有的樣本資料(以數字表達e.g. with one-hot encoding)都沒有5的話也可以塞5來代表他。

Note that if you’re expecting missing values in the test data, but the network was trained on data without any missing values, the network won’t have learned to ignore missing values! In this situation, you should artificially generate training samples with missing entries: copy some training samples several times, and drop some of the features that you expect are likely to be missing in the test data.

倘若你預期test data會遇到NA的話,那你勢必要在training時候,就已經讓模型有處理NA的經驗,這是最高的基本原則。具體要怎麼做呢?可以手動地在training set裡面增加有NA的樣本。你可以複製training set裡面的樣本,並且將你覺得在test set會遇到NA的那幾個變數,把他轉成NA(或用以表達他的數字值)

PoMingChen commented 5 years ago

4.4.2 Adding weight regularization

it’sdone by adding to the loss function of the network a cost associated with having large weights.

自己找一個更直覺簡單的例子。

metric in machine learning (what means?)

再找一個更直覺簡單的例子。

類神經網絡的epochs,是指什麼概念?

Optimization configuration是什麼概念

能不能再舉個例子