Open Pin-Jiun opened 1 year ago
truediv overload the / operator, and returns self, which is a Path object.
from pathlib import Path
data_dir = "./Dataset"
mapping_path = Path(data_dir) / "mapping.json"
print(Path(data_dir))
print(mapping_path)
Dataset
Dataset\mapping.json
mapping.json格式如下
{"speaker2id":{"id10001":0,"id10005":1.......}
"id2speaker":{"0":"id10001","1":"id10005"........}}
如下處理之後得到speaker2id的字典
# Load the mapping from speaker neme to their corresponding id.
mapping_path = Path(data_dir) / "mapping.json"
mapping = json.load(mapping_path.open())
self.speaker2id = mapping["speaker2id"]
speaker2id數據格式如下
{"id10001":0,"id10005":1}
metadata.json格式如下
{
"n_mels": 40,
"speakers": {
"id10473": [
{
"feature_path": "uttr-5c88b2f1803449789c36f14fb4d3c1eb.pt",
"mel_len": 652
},
{
"feature_path": "uttr-022a67baccc54bfda3567a7ac282a7b8.pt",
"mel_len": 564
},
{
"feature_path": "uttr-6a5c6e7231d642568633db13b6e429e1.pt",
"mel_len": 952
}],
"id10328": [
{
"feature_path": "uttr-b3d49b13bb36497186c923cd8f1b811e.pt",
"mel_len": 989
},
{
"feature_path": "uttr-da81c8475be040eda05784eca77e106f.pt",
"mel_len": 412
}]
}
}
下述處理之後得到metadata,metadata為字典,key為speaker id,value為dict組成的list,dict中包含featture_path和mel_len。
# Load metadata of training data.
metadata_path = Path(data_dir) / "metadata.json"
metadata = json.load(open(metadata_path))["speakers"]
metadata數據格式如下
{
"id10473": [
{
"feature_path": "uttr-5c88b2f1803449789c36f14fb4d3c1eb.pt",
"mel_len": 652
},
{
"feature_path": "uttr-022a67baccc54bfda3567a7ac282a7b8.pt",
"mel_len": 564
},
{
"feature_path": "uttr-6a5c6e7231d642568633db13b6e429e1.pt",
"mel_len": 952
}],
"id10328": [
{
"feature_path": "uttr-b3d49b13bb36497186c923cd8f1b811e.pt",
"mel_len": 989
},
{
"feature_path": "uttr-da81c8475be040eda05784eca77e106f.pt",
"mel_len": 412
}]
}
self.speaker_num = len(metadata.keys())#獲取說話者總數
self.data = []
for speaker in metadata.keys():
for utterances in metadata[speaker]:
self.data.append([utterances["feature_path"], self.speaker2id[speaker]])
metadata.keys()的長度即為說話者總數。
使用雙重循環將metadata中數據整理為2維list,List中第一列為feature_path,第二列為speaker。訓練數據如下所示
[["uttr-5c88b2f1803449789c36f14fb4d3c1eb.pt",0],
["uttr-022a67baccc54bfda3567a7ac282a7b8.pt",0],
["uttr-b3d49b13bb36497186c923cd8f1b811e.pt",1]
]
testdata.json格式
{"n_mels": 40, "utterances": [{"feature_path": "uttr-b73206c2bc3d42bf9c77ad4565d4ff15.pt", "mel_len": 2112}, {"feature_path": "uttr-8243426b1e3a4813aecda0df340d6a69.pt", "mel_len": 586}, {"feature_path": "uttr-c704f91ff7124a2b9b2865a9afce8417.pt", "mel_len": 465}]}
class InferenceDataset(Dataset):
def __init__(self, data_dir):
testdata_path = Path(data_dir) / "testdata.json"
metadata = json.load(testdata_path.open())
self.data_dir = data_dir
self.data = metadata["utterances"]
得到測試數據
[{"feature_path": "uttr-b73206c2bc3d42bf9c77ad4565d4ff15.pt", "mel_len": 2112},
{"feature_path": "uttr-8243426b1e3a4813aecda0df340d6a69.pt", "mel_len": 586},
{"feature_path": "uttr-c704f91ff7124a2b9b2865a9afce8417.pt", "mel_len": 465}]
"uttr-b73206c2bc3d42bf9c77ad4565d4ff15.pt"
在機器學習中,".pt"是一個常見的檔案擴展名,表示模型的保存格式。".pt"代表PyTorch的模型檔案,它包含了已經訓練好的模型的參數和相關資訊。
當使用PyTorch進行模型訓練時,我們可以將模型的參數保存到".pt"檔案中,以便於以後重新載入和使用模型。這對於避免重複訓練和在不同環境中部署模型非常有用。
通常,".pt"檔案中包含了模型的權重、參數以及相關的超參數設定。當我們需要使用這個已經訓練好的模型時,我們可以載入".pt"檔案,然後使用這些參數來進行預測、生成等任務。
總結來說,".pt"是PyTorch中用於保存訓練好的模型的檔案擴展名,它包含了模型的參數和相關資訊,方便在需要時載入和使用模型。
def collate_batch(batch):
# Process features within a batch.
"""Collate a batch of data."""
mel, speaker = zip(*batch)
# Because we train the model batch by batch, we need to pad the features in the same batch to make their lengths the same.
mel = pad_sequence(mel, batch_first=True, padding_value=-20) # pad log 10^(-20) which is very small value.
# mel: (batch size, length, 40)
return mel, torch.FloatTensor(speaker).long()
batch 傳入的是一個tuple
batch = (
(feature1, label1),
(feature2, label2),
(feature3, label3),
...
)
*batch 會先將batch進行unpack
(feature1, label1),
(feature2, label2),
(feature3, label3),
...
zip((feature1, label1),
(feature2, label2),
(feature3, label3),...)
zip會將feature合併成新的Tuple, label也合併成新的Tuple
mel= (feature1, feature2, feature3...)
speaker = (label1, label2, label3...)
self.encoder_layer = nn.TransformerEncoderLayer(
d_model=d_model, dim_feedforward=256, nhead=2
)
d_model (int) – the number of expected features in the input (required). d_model –編碼器/解碼器輸入大小(默認 512)。
nhead (int) – the number of heads in the multiheadattention models (required). nhead –多頭注意力模型的頭數(默認為8)。
dim_feedforward (int) – the dimension of the feedforward network model (default=2048). dim_feedforward –前饋網絡模型的中間層維度(默認= 2048)。 dim_feedforward is the feature no. of hidden layer of the FFN 輸入可以是input也可以是a hidden layer
dropout (float) – the dropout value (default=0.1).
activation (Union[str, Callable[[Tensor], Tensor]]) – the activation function of the intermediate layer, can be a string (“relu” or “gelu”) or a unary callable. Default: relu
layer_norm_eps (float) – the eps value in layer normalization components (default=1e-5).
batch_first (bool) – If True, then the input and output tensors are provided as (batch, seq, feature). Default: False (seq, batch, feature).
norm_first (bool) – if True, layer norm is done prior to attention and feedforward operations, respectively. Otherwise it’s done after. Default: False (after).
num_encoder_layers –編碼器中子編碼器層的數量(默認為6)。 num_decoder_layers –解碼器中子解碼器層的數量(默認為6)。
dropout –默認值= 0.1。 activation–編碼器/解碼器中間層的激活函數,relu或gelu(默認值= relu)。 custom_encoder –自定義編碼器(默認=None)。 custom_decoder –自定義解碼器(默認=None)。
https://pytorch.org/docs/stable/generated/torch.nn.TransformerEncoderLayer.html