Open Chandu-4444 opened 2 years ago
I think you've readded some files from TextModels.jl that we don't need, could you remove those? 🙂
Sure! Will clean up in the next commit.
julia> batches = FastText.load_batchseq(data, task)
julia> batches[1][1]
92-element Vector{Vector{Int64}}:
[25000, 25000, 25000, 25000, 25000, 25000, 25000, 25000]
[633779, 633779, 633779, 633779, 633779, 633779, 633779, 633779]
[2731, 34, 315, 354, 2087, 2209, 70, 1307]
[44047, 435, 633779, 633779, 6589, 633779, 633779, 205]
â‹®
[0, 0, 0, 0, 0, 213, 0, 0]
[0, 0, 0, 0, 0, 25, 0, 0]
[0, 0, 0, 0, 0, 1778, 0, 0]
julia> batches[1][2]
8-element Vector{Int64}:
1
1
1
1
1
0
1
1
vocab_size
keyword argument to TextClassificationSingle
.<unk>
, <pad>
to the vocabulary.Next:
Should the vocab CSV files be checked in? I would've assumed they would be artifacts or DataDeps as well.
julia> data, blocks = load(datarecipes()["imdb"])
((mapobs(loadfile, ObsView(::MLDatasets.FileDataset{typeof(identity), String}, ::Vector{Int64})), mapobs(parentname, ObsView(::MLDatasets.FileDataset{typeof(identity), String}, ::Vector{Int64}))), (Paragraph(), Label{String}(["neg", "pos"])))
julia> task = TextClassificationSingle(blocks, data)
SupervisedTask(Paragraph -> Label{String})
julia> model = FastAI.taskmodel(task, FastText.LanguageModel(false, task))
#90 (generic function with 1 method)
julia> batches = FastText.load_batchseq(data, task)
6250-element Vector{Tuple{Vector{Vector{Int64}}, WARNING: both Losses and NNlib export "ctc_loss"; uses of it in module Flux must be qualified
Flux.OneHotArray{UInt32, 2, 1, 2, Vector{UInt32}}}}:
([[35, 35, 35, 35], [3, 3, 3, 9], [40, 18025, 15, 14], [224, 10, 3541, 3040], [737, 34, 24, 505], [49, 7, 809, 3], [4, 4, 221, 3836], [1927, 104, 4,
3], [7, 16, 629, 28440], [6, 351, 7, 17] … [2, 2, 2, 44], [2, 2, 2, 3], [2, 2, 2, 9839], [2, 2, 2, 17], [2, 2, 2, 1041], [2, 2, 2, 27], [2, 2, 2, 3], [2, 2, 2, 3836], [2, 2, 2, 3], [2, 2, 2, 28440]], [0 0 1 1; 1 1 0 0])
julia> using FluxTraining
julia> td, vd = splitobs(batches, at=0.9)
julia> using Flux
julia> learner = Learner(model, Flux.Losses.logitcrossentropy, callbacks=[Metrics(accuracy)]; data=(td, vd))
Learner()
julia> fit!(learner, 1)
Epoch 1 TrainingPhase(): 0%|â–ˆ | ETA: 4 days, 3:35:31
The changes have been merged to #258.
Can do the following: