bojone / bert4keras

keras implement of transformers for humans
https://kexue.fm/archives/6915
Apache License 2.0
5.36k stars 927 forks source link

albert训练loss nan问题 #361

Open ganchengguang opened 3 years ago

ganchengguang commented 3 years ago

提问时请尽可能提供如下信息:

基本信息

核心代码

# 请在此处贴上你的核心代码。
# 请尽量只保留关键部分,不要无脑贴全部代码。

num_classes = 2 maxlen = 128 batch_size = 4 config_path = 'E:/bert4keras-master/albert_large/albert_config.json' checkpoint_path = 'E:/bert4keras-master/albert_large/model.ckpt-best' spm_path = 'E:/bert4keras-master/albert_large/30k-clean.model' dict_path = 'E:/bert4keras-master/albert_large/30k-clean.vocab'

bert = build_transformer_model( config_path=config_path, checkpoint_path=checkpoint_path, model='albert', return_keras_model=False, )

output = Lambda(lambda x: x[:, 0], name='CLS-token')(bert.model.output) output = Dense( units=num_classes, activation='softmax', kernel_initializer=bert.initializer )(output) output = Lambda(lambda x: x+1e-8)(output) print('output:',output) model = keras.models.Model(bert.model.input, output) model.summary()

ad=tf.compat.v1.train.AdamOptimizer(learning_rate=1e-5)

AdamLR = extend_with_piecewise_linear_lr(Adam, name='AdamLR')

model.compile( loss='sparse_categorical_crossentropy',

optimizer=AdamLR(1e-5),

 # 用足够小的学习率
optimizer=AdamLR(learning_rate=1e-6, lr_schedule={
    1000: 1,
   2000: 0.1
}),
metrics=['accuracy'],

)

输出信息

# 请在此处贴上你的调试输出

CLS-token (Lambda) (None, 1024) 0 Transformer-FeedForward-Norm[23][


dense_7 (Dense) (None, 2) 2050 CLS-token[0][0]


lambda_1 (Lambda) (None, 2) 0 dense_7[0][0]

Total params: 16,636,418 Trainable params: 16,636,418 Non-trainable params: 0


WARNING:tensorflow:From E:\anaconda3\envs\py36gputrain\lib\site-packages\tensorflow\python\ops\math_grad.py:1250: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where WARNING:tensorflow:From E:\anaconda3\envs\py36gputrain\lib\site-packages\keras\backend\tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

Epoch 1/20 1343/4408 [========>.....................] - ETA: 7:26 - loss: nan - accuracy: 0.6292

自我尝试

不管什么问题,请先尝试自行解决,“万般努力”之下仍然无法解决再来提问。此处请贴上你的努力过程。 调整了激活函数,还有调小学习率,但是都没用,数据集很小只有28000件,脏数据已经通过continue跳过了,用的是英文原版,英文句子分类任务。劳烦苏神指点一下,

ganchengguang commented 3 years ago

自我尝试 Epoch 3/20 3/3 [==============================] - 0s 77ms/step - loss: 0.6158 - accuracy: 0.6667 数据问题,自己做的数据集在您bert模型的时候没有问题,到了albert就会出现lossnan,在我只只用几条数据去训练albert模型的时候,loss出来了。这个该怎么解决呢

ganchengguang commented 3 years ago

训练集的样本是这样的。只对第二个标签进行分类 meta others Jitesh Vishwakarma meta others E-mail-Id: - jvishwakarma123@gmail.com meta others Contact Number: - 9960902548 header experience PROFESSIONAL SUMMARY: content experience · 4 years of technical experience in implementation, customization, integration and support of business application system. content experience · Having Domain Experience in PAYMENT, AUTOMOBILE and HEALTH-CARE. content experience · Experienced in developing Web based applications with J2EE, JSP, Servlets, JDBC, Spring, Hibernate. content experience · Experience in designing, developing and deploying J2EE application on IBM WebSphere/Web Logic Application Servers, Tomcat, etc. content experience · Exposure to AGILE methodologies. content experience · Hands on exposure to multiple Application Servers like GLASSFISH, and IBM Web Sphere Server. content experience · Expertise in back-end procedure development, for Database Applications using ORACLE and SQL Server. header knowledge TECHNICAL SKILLS: content knowledge · Software Products: Apache Tomcat Server, WebSphere Application Server, JBoss, GlassFish. content knowledge · Language: Java, J2EE, Spring, Hibernate, EJB, Webservice, XML, HTML, PL/SQL, Java Script content knowledge · Tools: Eclipse, Web Sphere MQ,SOAPUI,HP ALM content knowledge · Databases: Oracle ,SQL content knowledge · Version Control Tools: SVN,Git, Perforce. header others EMPLOYMENT HISTORY content others · Currently working with FundTech India Pvt Ltd, Pune from June 2017 to Till Date. content others · PVMSys InfraSolution Pvt Ltd,Pune from May 2015 to June 2017. content others · Rechnetic System, Pune from April 2014 to May 2015 header experience WORK EXPERIENCE content experience Fundtech India Pvt Ltd,Pune (June-2017 to till date) content experience Product: GPP (Global PAYplus). header experience Product Summary: content experience Global PAY plus–Services Platform (GPP-SP) solution is a payments platform that is used by many of the world’s largest global and domestic banks. A centralized high-performance payment hub developed using service-oriented architecture (SOA) allows the bank to offer their customers new levels of business and functional payment service capabilities on an affordable processing platform. content experience GPP-SP combines the agility of SOA with the flexibility of a rules-based system. content experience The rules engine is designed to work in a global environment using the language of business users. Rules can be applied without the need for additional programming. By eliminating the need to re-code, users can save time and money while significantly improving their time-to-market for new products. In addition, GPP-SP offers customers the flexibility to adapt to change. GPP-SP users can add a new feature to a product simply by changing a rule. In the past, adding such an enhancement would have taken many months. content experience Technology: CoreJava, Spring, Hibernate, Oracle Sql Developer, Web Logic Server, IBM Web Sphere MQ, Agile Development,JMS, SoapUI, Perforce ,HP ALM(QC). header experience Responsibilities: content experience · Certify, develop and Implement new feature in product as per the business requirement for particular banks and clearing house by following standards. content experience · Involved in Product certification for Bank of Ireland for Mass Payments (MP). content experience · Involved In fixing defects in product, testing and automating the test cases. content experience · Involved In participation of regular sprint planning status meetings to discuss the risk arising in ongoing sprint with teammates and team lead. content experience · Have individual contribute to coordinate/communicate with onsite team and Product owner. content experience · Have worked on Java, Spring, Hibernate, Oracle Sql developer, IBM Websphere, SoapUI, JMS. meta others . content experience PVMSys InfraSolutions Pvt Ltd,Pune (May-2015 to June 2017) content experience Product: PVMSys Framework header project Project Summary: content experience PVMSys Framework is based in Test Data Management system, related to Engine part of vehicle. Facilitates workflow in automobile testing domain and manage all the test data generated at different test Cells in to central system where some decision making logic is there to decide that how many test needs to be conducted on a particular model before issuing a test report and also provides Report, Localization, Email Alerts and test data analysis. content experience Technology: CoreJava, J2EE, SpringMVC, Hibernate, XML, OSGI, ASM Commander,SVN header experience Responsibilities: content experience · Involved in Coding, Deployment. content experience · Develop new enhancement as per the requirement and Sending daily status report. content experience · Leading with 4 member team and allocating task. content experience · Work on core part of OSGI framework and Flex. content experience · Responsibility for debugging and maintaining the code. content experience Rechnetic System, Pune (April-2014 to May-2015) header project Project # 1: Gripic.in header project Project Summary: content experience To Search the Doctor and Hospital from the Particular Area, Location. In this application patient where he/she can do online appointment as per the doctor time slot. Doctor can update his time slot as per his availability .Patient can give rating and review to doctor or hospital. It’s a web based application worked on Communication and mailing functionality. content experience Technology: Core Java (OOPS and Collection), JSP, Spring, Hibernate, HTML, JavaScript. header experience Responsibilities: content experience · Work on application technology Java ,SpringMVC ,Hibernate . content experience · Develop Mailing Configuration as per appointment . content experience · Involved in understanding the functional and requirement. content experience · Involved in Coding from end-to-end application. header education EDUCATIONAL QUALIFICATION content education 2014 M.C.A from ASM’s College of Commerce, Science and Information Technology, Pune University content education 2011 B.C.A from ASM’s College of Commerce, Science and Information Technology, Pune University. content education 2008 H.S.C from Shri Swami Samarth Junior College, Maharashtra Board. content education 2006 S.S.C from Shri Swami Samarth English Medium School, Maharashtra Board. header others PERSONAL DETAILS meta others · Name : Jitesh Vishwakarma meta others · Date of Birth: 16th Feb, 1991. meta others · Address: Sr.no-23, Hanuman Nagar, Bhagat Wasti, Bhosari, Pune-411026. meta knowledge · Language knows: English and Hindi. meta others · Passport Number : E3566819 header others Date : meta others Place: PUNE meta others Jitesh Vishwakarma.

ganchengguang commented 3 years ago

在我删除了education,project所有样本之后,可以正常出现loss,应该是数据问题,但是要怎么解决呢。现在样本读取代码是这样的 line_label = {0: 'experience', 1: 'knowledge', 2: 'education', 3: 'project', 4: 'others'} label2index = {v:k for k,v in line_label.items()} def load_text_label_pairs(data_dir_path, label_type=None): if label_type is None: label_type = 'line_type'

result = []

for f in os.listdir(data_dir_path):
    data_file_path = os.path.join(data_dir_path, f)
    if os.path.isfile(data_file_path) and f.lower().endswith('.txt'):
        with open(data_file_path, mode='rt', encoding='utf8') as file:
            for line in file:

                line_type, line_label, sentence = line.strip().split('\t')
                if line_label not in label2index.keys():
                    continue
                if label_type == 'line_type':
                    result.append((sentence, line_type))
                else:
                    result.append((sentence, label2index[line_label]))
return result
bojone commented 3 years ago

所以你的意思是纯瞎套用代码,完全不理解一下代码的吗?明明是个多分类,然后还num_classes=2

ganchengguang commented 3 years ago

所以你的意思是纯瞎套用代码,完全不理解一下代码的吗?明明是个多分类,然后还num_classes=2

对不起苏神,忘记unit改了,没有瞎套用,对不起对不起。